CS:G0 è uno sparatutto in prima persona. Si tratta di un videogame strategico sviluppato da Valve Corporation e Hidden Path Entertainment, annunciato il primo settembre 2011 e rilasciato il 21 agosto 2012.
Il gioco dispone di diverse modalità, la più importante delle quali è la cosiddetta "competitiva". La partita è alla meglio di 30 round, con possibilità di supplementari nel caso alla fine si giungesse ad un 15:15. I giocatori sono divisi in 2 squadre da 5 componenti l'una, i terroristi T, e gli anti terroristi CT. La principale modalità è la "bomb defusal", i.e. i terroristi hanno l'obiettivo di piazzare la bomba in uno dei due siti all'interno della mappa (entro 2 minuti dall'inizio del round), gli antiterroristi devono disinnescarla prima che questa esploda (il timer della bomba è di 40 secondi). Entrambe le parti, tuttavia, vincono in caso di eliminazione della squadra avversaria, a meno che la bomba non sia già stata piazzata.
Ad ogni turno le squadre vengono ricompensate con una certa somma di denaro, che può poi essere spesa per l'acquisto di armi all'inizio del round. Anche le uccisioni permettono di guadagnare, e ogni arma ha una certa ricompensa che solitamente è inversamente proporzionale alla qualità dell'arma.
In una partita di CS:GO a determinare il vincitore non è la sola capacità del giocatore, ma l'abilità nel gestire l'economia, la coesione del team ecc... Un semplice sparatutto può presentare una tale quantità di variabili da considerare, complessità da dover gestire da poter essere paragonabile ad una partita di scacchi. La scelta di sacrificare un giocatore può non essere fine a sè stessa, ad esempio, ma può rientrare in una strategia più ampia.
La platea a cui CS:GO fa riferimento è sempre più ampia, non solo in termini del numero di persone che ci giocano o che semplicemente guardano le partite, ma anche per ciò che concerne i ricavi. Nel 2020 CS:GO ha segnato il raggiungimento di un miliardo di ricavi, senza contare il mercato che prolifera attorno ad esso, o la miriade di sponsor che si possono ritrovare nelle partite professioniste.
Questa storia non si conclude con le sole pubblicità, sponsor, eventi; all'interno del medesimo gioco vengono offerti servizi di analisi e visualizzazione dei dati delle partite, ed esistono anche siti esterni che offrono strumenti più elaborati, naturalmente, il tutto viene offerto dietro pagamento.
import os
from IPython.display import Image
Image(url='http://media.steampowered.com/apps/csgo/blog/images/fb_image.png?v=6')
Il presente lavoro di progetto si pone perciò in questo contesto. Il nostro obiettivo è quello di riuscire ad utilizzare gli strumenti messi in campo dal "data mining" per poter estrarre delle informazioni utili a partire da dati relativi a delle partite di CS:GO, in particolare, il task principe si configura nella possibilità di predirre l'esito di un round a partire da uno snapshot di quest'ultimo. Tale sfida rientra quindi nel contesto della classificazione binaria: si parte da un dataset con tuple classificate, lo scopo è quello di riuscire a creare un modello che catturando i pattern delle tuple sia in grado di predire il valore della classe target.
Come per un qualsiasi sport (che esso sia fisico o digitale, i.e. e-sport) la presenza di partite, soprattutto giocate a livello professionistico conduce allo sviluppo di sistemi di betting e quindi, di predizione del vincitore.
L'obiettivo di questo progetto, vista la natura didattica di quest'ultimo, si configura principalmente nell'uso delle tecnologie offerte da Python per i task di data mining; per poter osservare nella pratica quanto visto durante le lezioni teoriche. Sebbene le performance non siano lo scopo principale, sarà comunque parte integrante di questo lavoro la ricerca dei migliori approcci per poter ottenere quante più informazioni dai dati a nostra diposizione.
1 Data visualization
1.1 Introduzione al dataset
1.2 I dati
1.2.1 Chiarimenti
1.3 Importazione del dataset
1.4 Arricchimento
1.5 Informazioni base
1.6 Visualizzazione delle distribuzioni
1.7 L'equipaggiamento
1.8 Le armi
1.9 Scatter plot
1.10 Le mappe
1.11 Correlazioni tra gli attributi
1.12 Rumore e outliers
1.13 Ancora sulla correlazione
1.14 Features importance
1.15 Sintesi
2 Pre-processing
2.1 Null values
2.2 Attributi categorici
2.3 Outliers
2.4 Attributi correlati
2.5 Attributi non importanti
2.6 Sintesi
3 Modelli di classificazione
3.1 Random forest
3.2 K-neighbors
3.3 SGD
3.4 Naive bayes
3.5 Decision tree
3.6 Ensemble
3.6.1 Parametri generali
3.6.2 Parametri del booster
3.6.3 Parametri del learning
3.6.4 Tuning
4 Prestazioni dei classificatori
5 Rete neurale
5.1 Introduzione
5.2 Primo tentativo
5.3 Tuning della ANN
6 Conclusioni
Il dataset utilizzato proviene dal seguente link kaggle:
https://www.kaggle.com/datasets/christianlillelund/csgo-round-winner-classification
Consiste di snapshot prelevati da round svolti in tornei professionisti, svoltisi tra il 2019 e 2020. I dati non fanno riferimento alle fasi di riscaldamento e riavvio, queste tuple sono state filtrate. Gli snapshot sono stati prelevati dalle partite ad intervalli regolari. Come espresso dalla fonte, le tuple sono state preprocessate per permettere una migliore leggibilità.
Eventuali informazioni personali, come ad esempio, nome, cognome ecc... dei giocatori non sono state inserite.
Come già detto i dati sono relativi a degli snapshot dei round di alcune partite, per tale motivo è possibile trovare i seguenti campi
time_left : il tempo rimanente nel round
ct_score : il punteggio corrente per gli anti-terroristi
t_score : il punteggio corrente per i terroristi
map : la mappa nella quale il ruond è giocato
bomb_planted : un booleano che indica se la bomba è stata già piazzata o meno
ct_healt : il valore totale della vita per la squadra degli anti-terroristi
t_health : il valore totale della vita per la squadra dei terroristi
ct_armor : il valore totale dell'armatura per gli anti-terroristi
ct_armor : il valore totale dell'armatura per i terroristi
ct_money : la somma totale di denaro per gli anti-terroristi
ct_money : la somma totale di denaro per i terroristi
ct_helmets : il numero totale di elmetti per i CT
t_helmets : il numero totale di elmetti per i T
ct_defuse_kits : il numero totale di kit per il disinnesco per i CT
ct_players_alive : il numero totale di giocatori CT ancora vivi
t_players_alive : il numero totale di giocatori T ancora vivi
Dopodiché, sono elencati una serie di attributi, uno per ogni arma (replicato per i terroristi e per gli anti-terroristi) in cui viene riportato il numero di giocatori della squadra che hanno al momento quell'arma
Come ultima feature è riportato l'attributo target, i.e. round_winner (0 per CT, 1 per T).
Per una migliore comprensione del dataset è d'obbligo ricordare che:
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from pandas import DataFrame
from sklearn.preprocessing import MinMaxScaler
import warnings
import re
from pandas.plotting import scatter_matrix
from pandas.api.types import is_numeric_dtype
import numpy as np
import pandas as ps
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
np.random.seed(42)
target_feature='round_winner'
warnings.filterwarnings('ignore')
Encoder_df = LabelEncoder()
ps.set_option('display.max_rows', None)
Come prima cosa si provvede ad importare le librerie che serviranno durante lo svolgimento del progetto. Vengono inoltre definite alcune funzioni, come ad esempio quella per la rimozione degli attributi la cui correlazione è troppo elevata (remove_collinear_features) e per effettuare alcune visualizzazioni.
def remove_collinear_features(x, threshold):
'''
Objective:
Remove collinear features in a dataframe with a correlation coefficient
greater than the threshold. Removing collinear features can help a model
to generalize and improves the interpretability of the model.
Inputs:
x: features dataframe
threshold: features with correlations greater than this value are removed
Output:
dataframe that contains only the non-highly-collinear features
'''
# Calculate the correlation matrix
corr_matrix = x.corr()
iters = range(len(corr_matrix.columns) - 1)
drop_cols = []
# Iterate through the correlation matrix and compare correlations
for i in iters:
for j in range(i+1):
item = corr_matrix.iloc[j:(j+1), (i+1):(i+2)]
col = item.columns
row = item.index
val = abs(item.values)
# If correlation exceeds the threshold
if val >= threshold:
# Print the correlated features and the correlation value
print(col.values[0], "|", row.values[0], "|", round(val[0][0], 2))
drop_cols.append(col.values[0])
# Drop one of each pair of correlated columns
drops = set(drop_cols)
x = x.drop(columns=drops)
return x
def plot_distributions(features):
for i in features:
if ds[i].value_counts().shape[0]>20:
plt.figure(figsize=(12,8))
sns.distplot(ds[i][ds[target_feature]=='CT'],hist=True,color='g', label='CT',hist_kws={'edgecolor':'black'})
_=sns.distplot(ds[i][ds[target_feature]=='T'],hist=True,color='r',label='T',hist_kws={'edgecolor':'black'})
plt.legend()
plt.show()
else:
plt.figure(figsize=(14, 6))
sns.countplot(x=i, hue=target_feature, data=ds, palette='coolwarm')
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks(rotation=45)
plt.show()
def plot_based_on(x,y,data,title):
plt.figure(figsize=(15,7))
p = sns.barplot(x = x, y = y, data = data)
p = plt.xticks(rotation=90)
plt.title(title, size=20)
Si procede poi con l'importazione del dataset
file_name = "csgo_round_snapshots.csv"
ds = ps.read_csv(file_name)
ds.head()
| time_left | ct_score | t_score | map | bomb_planted | ct_health | t_health | ct_armor | t_armor | ct_money | ... | t_grenade_flashbang | ct_grenade_smokegrenade | t_grenade_smokegrenade | ct_grenade_incendiarygrenade | t_grenade_incendiarygrenade | ct_grenade_molotovgrenade | t_grenade_molotovgrenade | ct_grenade_decoygrenade | t_grenade_decoygrenade | round_winner | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 175.00 | 0.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 0.0 | 0.0 | 4000.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | CT |
| 1 | 156.03 | 0.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 400.0 | 300.0 | 600.0 | ... | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | CT |
| 2 | 96.03 | 0.0 | 0.0 | de_dust2 | False | 391.0 | 400.0 | 294.0 | 200.0 | 750.0 | ... | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | CT |
| 3 | 76.03 | 0.0 | 0.0 | de_dust2 | False | 391.0 | 400.0 | 294.0 | 200.0 | 750.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | CT |
| 4 | 174.97 | 1.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 192.0 | 0.0 | 18350.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | CT |
5 rows × 97 columns
Per poter effettuare una visualizzazione completa dei dati sono stati introdotti alcune variabii che permettono di estrarre le armi, e suddividerle per categoria. In CS:GO è infatti possibile individuare le seguente classi:
Inoltre, per ogni arma è stato associato un costo, che corrisponde ai fondi necessari per acquistarla all'interno del gioco (quest'ultima informazione è stata importata dall'esterno).
weapons=['t_weapon_galilar','t_weapon_famas','t_weapon_ak47','t_weapon_m4a4','t_weapon_m4a1s','t_weapon_sg553',
't_weapon_aug','t_weapon_g3sg1','t_weapon_scar20','t_weapon_ssg08','t_weapon_awp',
't_weapon_glock','t_weapon_usps','t_weapon_p2000','t_weapon_p250','t_weapon_cz75auto','t_weapon_fiveseven',
't_weapon_tec9','t_weapon_elite','t_weapon_deagle','t_weapon_r8revolver',
't_weapon_mac10','t_weapon_mp9','t_weapon_bizon','t_weapon_mp7','t_weapon_ump45','t_weapon_p90','t_weapon_mp5sd',
't_weapon_sawedoff','t_weapon_mag7','t_weapon_nova','t_weapon_xm1014','t_weapon_m249','t_weapon_negev',
't_grenade_decoygrenade','t_grenade_flashbang','t_grenade_smokegrenade','t_grenade_hegrenade','t_grenade_molotovgrenade',
't_grenade_incendiarygrenade',
'ct_weapon_galilar','ct_weapon_famas','ct_weapon_ak47','ct_weapon_m4a4','ct_weapon_m4a1s','ct_weapon_sg553',
'ct_weapon_aug','ct_weapon_g3sg1','ct_weapon_scar20','ct_weapon_ssg08','ct_weapon_awp',
'ct_weapon_glock','ct_weapon_usps','ct_weapon_p2000','ct_weapon_p250','ct_weapon_cz75auto','ct_weapon_fiveseven',
'ct_weapon_tec9','ct_weapon_elite','ct_weapon_deagle','ct_weapon_r8revolver',
'ct_weapon_mac10','ct_weapon_mp9','ct_weapon_bizon','ct_weapon_mp7','ct_weapon_ump45','ct_weapon_p90','ct_weapon_mp5sd',
'ct_weapon_sawedoff','ct_weapon_mag7','ct_weapon_nova','ct_weapon_xm1014','ct_weapon_m249','ct_weapon_negev',
'ct_grenade_decoygrenade','ct_grenade_flashbang','ct_grenade_smokegrenade','ct_grenade_hegrenade','ct_grenade_molotovgrenade',
'ct_grenade_incendiarygrenade',
'ct_defuse_kits']
weapon_cost_t=ps.Series(
[
2000,2050,2700,3100,2900,3000,3300,5000,5000,1700,4750,
200,200,200,300,500,500,500,400,700,600,
1050,1250,1400,1500,1200,2350,1500,
1100,1300,1050,2000,5200,1700,
50,200,300,300,400,600,
]
,index=[
't_weapon_galilar','t_weapon_famas','t_weapon_ak47','t_weapon_m4a4','t_weapon_m4a1s','t_weapon_sg553',
't_weapon_aug','t_weapon_g3sg1','t_weapon_scar20','t_weapon_ssg08','t_weapon_awp',
't_weapon_glock','t_weapon_usps','t_weapon_p2000','t_weapon_p250','t_weapon_cz75auto','t_weapon_fiveseven',
't_weapon_tec9','t_weapon_elite','t_weapon_deagle','t_weapon_r8revolver',
't_weapon_mac10','t_weapon_mp9','t_weapon_bizon','t_weapon_mp7','t_weapon_ump45','t_weapon_p90','t_weapon_mp5sd',
't_weapon_sawedoff','t_weapon_mag7','t_weapon_nova','t_weapon_xm1014','t_weapon_m249','t_weapon_negev',
't_grenade_decoygrenade','t_grenade_flashbang','t_grenade_smokegrenade','t_grenade_hegrenade','t_grenade_molotovgrenade',
't_grenade_incendiarygrenade',
]
)
weapon_cost_ct=ps.Series(
[
2000,2050,2700,3100,2900,3000,3300,5000,5000,1700,4750,
200,200,200,300,500,500,500,400,700,600,
1050,1250,1400,1500,1200,2350,1500,
1100,1300,1050,2000,5200,1700,
50,200,300,300,400,600,
400
]
,index=[
'ct_weapon_galilar','ct_weapon_famas','ct_weapon_ak47','ct_weapon_m4a4','ct_weapon_m4a1s','ct_weapon_sg553',
'ct_weapon_aug','ct_weapon_g3sg1','ct_weapon_scar20','ct_weapon_ssg08','ct_weapon_awp',
'ct_weapon_glock','ct_weapon_usps','ct_weapon_p2000','ct_weapon_p250','ct_weapon_cz75auto','ct_weapon_fiveseven',
'ct_weapon_tec9','ct_weapon_elite','ct_weapon_deagle','ct_weapon_r8revolver',
'ct_weapon_mac10','ct_weapon_mp9','ct_weapon_bizon','ct_weapon_mp7','ct_weapon_ump45','ct_weapon_p90','ct_weapon_mp5sd',
'ct_weapon_sawedoff','ct_weapon_mag7','ct_weapon_nova','ct_weapon_xm1014','ct_weapon_m249','ct_weapon_negev',
'ct_grenade_decoygrenade','ct_grenade_flashbang','ct_grenade_smokegrenade','ct_grenade_hegrenade','ct_grenade_molotovgrenade',
'ct_grenade_incendiarygrenade',
'ct_defuse_kits'
]
)
ct_weapon=[]
ct_utilities=[]
t_utilities=[]
t_weapon=[]
rexex='^t'
for w in weapons:
if(w=='ct_weapon_usps' or w=='ct_weapon_p2000' or w=='t_weapon_glock' or w =='ct_defuse_kits'):
continue
if re.search(rexex,w):
if w.__contains__('grenade'):
t_utilities.append(w)
else:
t_weapon.append(w)
else:
if w.__contains__('grenade'):
ct_utilities.append(w)
else:
ct_weapon.append(w)
machine_guns_shotguns=[ 'ct_weapon_mag7', 't_weapon_mag7','ct_weapon_nova', 't_weapon_nova','ct_weapon_sawedoff', 't_weapon_sawedoff'
,'ct_weapon_xm1014','t_weapon_xm1014','ct_weapon_m249', 't_weapon_m249','ct_weapon_negev', 't_weapon_negev']
pistols=['ct_weapon_cz75auto', 't_weapon_cz75auto', 'ct_weapon_elite',
't_weapon_elite','ct_weapon_glock','ct_weapon_r8revolver','t_weapon_r8revolver',
'ct_weapon_deagle', 't_weapon_deagle','ct_weapon_fiveseven', 't_weapon_fiveseven',
't_weapon_usps', 'ct_weapon_p250', 't_weapon_p250',
't_weapon_p2000', 'ct_weapon_tec9', 't_weapon_tec9']
smg=['ct_weapon_bizon', 't_weapon_bizon','ct_weapon_mac10', 't_weapon_mac10',
'ct_weapon_mp5sd', 't_weapon_mp5sd','ct_weapon_mp7', 't_weapon_mp7',
'ct_weapon_mp9', 't_weapon_mp9','ct_weapon_p90', 't_weapon_p90',
'ct_weapon_ump45', 't_weapon_ump45']
sniper_rifles=['ct_weapon_awp', 't_weapon_awp','ct_weapon_g3sg1', 't_weapon_g3sg1', 'ct_weapon_scar20', 't_weapon_scar20',
'ct_weapon_ssg08', 't_weapon_ssg08']
rifles=['ct_weapon_ak47', 't_weapon_ak47', 'ct_weapon_aug', 't_weapon_aug',
'ct_weapon_famas', 't_weapon_famas','ct_weapon_galilar',
't_weapon_galilar','ct_weapon_m4a1s', 't_weapon_m4a1s','ct_weapon_m4a4', 't_weapon_m4a4',
'ct_weapon_sg553','t_weapon_sg553'
]
machine_guns_shotguns_ct=['ct_weapon_mag7','ct_weapon_nova', 'ct_weapon_sawedoff',
'ct_weapon_xm1014','ct_weapon_m249', 'ct_weapon_negev', ]
machine_guns_shotguns_t=['t_weapon_mag7','t_weapon_nova', 't_weapon_sawedoff',
't_weapon_xm1014','t_weapon_m249', 't_weapon_negev', ]
pistols_ct=['ct_weapon_cz75auto', 'ct_weapon_elite',
'ct_weapon_glock','ct_weapon_r8revolver',
'ct_weapon_deagle','ct_weapon_fiveseven','ct_weapon_p250', 'ct_weapon_tec9' ]
pistols_t=['t_weapon_cz75auto', 't_weapon_elite','t_weapon_r8revolver','t_weapon_p2000',
't_weapon_deagle','t_weapon_fiveseven','t_weapon_p250', 't_weapon_tec9','t_weapon_usps']
smg_ct=['ct_weapon_bizon', 'ct_weapon_mac10',
'ct_weapon_mp5sd', 'ct_weapon_mp7',
'ct_weapon_mp9', 'ct_weapon_p90',
'ct_weapon_ump45', ]
smg_t=['t_weapon_bizon', 't_weapon_mac10',
't_weapon_mp5sd', 't_weapon_mp7',
't_weapon_mp9', 't_weapon_p90',
't_weapon_ump45', ]
sniper_rifles_ct=['ct_weapon_awp','ct_weapon_g3sg1', 'ct_weapon_scar20',
'ct_weapon_ssg08']
sniper_rifles_t=['t_weapon_awp','t_weapon_g3sg1', 't_weapon_scar20',
't_weapon_ssg08']
rifles_ct=['ct_weapon_ak47', 'ct_weapon_aug',
'ct_weapon_famas', 'ct_weapon_galilar',
'ct_weapon_m4a1s', 'ct_weapon_m4a4',
'ct_weapon_sg553'
]
rifles_t=['t_weapon_ak47', 't_weapon_aug',
't_weapon_famas', 't_weapon_galilar',
't_weapon_m4a1s', 't_weapon_m4a4',
't_weapon_sg553'
]
primary_ct=[*rifles_ct,*sniper_rifles_ct,*smg_ct,*machine_guns_shotguns_ct]
secondary_ct=[*pistols_ct,'ct_weapon_usps','ct_weapon_p2000']
primary_t=[*rifles_t,*sniper_rifles_t,*smg_t,*machine_guns_shotguns_t]
secondary_t=[*pistols_t,'t_weapon_glock']
weapon_classes=[rifles,sniper_rifles,smg,machine_guns_shotguns,pistols]
weapon_classes_ct=[rifles_ct,sniper_rifles_ct,smg,machine_guns_shotguns_ct,pistols_ct]
weapon_classes_t=[rifles_t,sniper_rifles_t,smg,machine_guns_shotguns_t,pistols_t]
Si è deciso di effettuare dell'arricchimento dei dati già a nostra disposizione. Avendo accesso al prezzo si è calcolato, per ogni round, il costo totale dell'equipaggiamento per le due squadre. L'assunzione che qui facciamo è che la squadra meglio equipaggiata avrà più probabilità di vincere il round.
def equipment_cost_t(row : ps.Series):
return row[weapons].multiply(weapon_cost_t,fill_value=0).values.sum()
def equipment_cost_ct(row:ps.Series):
return row[weapons].multiply(weapon_cost_ct,fill_value=0).values.sum()
ds['ct_equipment_cost']=ds.apply(lambda row : equipment_cost_ct(row),axis=1)
ds['t_equipment_cost']=ds.apply(lambda row : equipment_cost_t(row),axis=1)
ds['equipment_difference']=ds['ct_equipment_cost']-ds['t_equipment_cost']
ds['ct_equipment_per_player']=ds['ct_equipment_cost']/ds['ct_players_alive']
ds['t_equipment_per_player']=ds['t_equipment_cost']/ds['t_players_alive']
ds['ct_equipment_per_player'].fillna(0,inplace=True)
ds['t_equipment_per_player'].fillna(0,inplace=True)
ds['score_difference']=ds['ct_score']-ds['t_score']
ds['money_difference']=ds['ct_money']-ds['t_money']
Il dataset aggiornato è quindi il seguente
ds.head()
| time_left | ct_score | t_score | map | bomb_planted | ct_health | t_health | ct_armor | t_armor | ct_money | ... | ct_grenade_decoygrenade | t_grenade_decoygrenade | round_winner | ct_equipment_cost | t_equipment_cost | equipment_difference | ct_equipment_per_player | t_equipment_per_player | score_difference | money_difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 175.00 | 0.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 0.0 | 0.0 | 4000.0 | ... | 0.0 | 0.0 | CT | 1000.0 | 1000.0 | 0.0 | 200.0 | 200.0 | 0.0 | 0.0 |
| 1 | 156.03 | 0.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 400.0 | 300.0 | 600.0 | ... | 0.0 | 0.0 | CT | 1400.0 | 1600.0 | -200.0 | 280.0 | 320.0 | 0.0 | -50.0 |
| 2 | 96.03 | 0.0 | 0.0 | de_dust2 | False | 391.0 | 400.0 | 294.0 | 200.0 | 750.0 | ... | 0.0 | 0.0 | CT | 1200.0 | 1400.0 | -200.0 | 300.0 | 350.0 | 0.0 | 250.0 |
| 3 | 76.03 | 0.0 | 0.0 | de_dust2 | False | 391.0 | 400.0 | 294.0 | 200.0 | 750.0 | ... | 0.0 | 0.0 | CT | 1200.0 | 800.0 | 400.0 | 300.0 | 200.0 | 0.0 | 250.0 |
| 4 | 174.97 | 1.0 | 0.0 | de_dust2 | False | 500.0 | 500.0 | 192.0 | 0.0 | 18350.0 | ... | 0.0 | 0.0 | CT | 1400.0 | 1000.0 | 400.0 | 280.0 | 200.0 | 1.0 | 7600.0 |
5 rows × 104 columns
Il dataset presenta 122410 tuple e 104 attributi.
ds.shape
(122410, 104)
In particolar modo la maggior parte delle features sono di tipo numerico, float64, vi sono 2 object (le stringhe che indicano la mappa e il vincitore del round)
ds.dtypes
time_left float64 ct_score float64 t_score float64 map object bomb_planted bool ct_health float64 t_health float64 ct_armor float64 t_armor float64 ct_money float64 t_money float64 ct_helmets float64 t_helmets float64 ct_defuse_kits float64 ct_players_alive float64 t_players_alive float64 ct_weapon_ak47 float64 t_weapon_ak47 float64 ct_weapon_aug float64 t_weapon_aug float64 ct_weapon_awp float64 t_weapon_awp float64 ct_weapon_bizon float64 t_weapon_bizon float64 ct_weapon_cz75auto float64 t_weapon_cz75auto float64 ct_weapon_elite float64 t_weapon_elite float64 ct_weapon_famas float64 t_weapon_famas float64 ct_weapon_g3sg1 float64 t_weapon_g3sg1 float64 ct_weapon_galilar float64 t_weapon_galilar float64 ct_weapon_glock float64 t_weapon_glock float64 ct_weapon_m249 float64 t_weapon_m249 float64 ct_weapon_m4a1s float64 t_weapon_m4a1s float64 ct_weapon_m4a4 float64 t_weapon_m4a4 float64 ct_weapon_mac10 float64 t_weapon_mac10 float64 ct_weapon_mag7 float64 t_weapon_mag7 float64 ct_weapon_mp5sd float64 t_weapon_mp5sd float64 ct_weapon_mp7 float64 t_weapon_mp7 float64 ct_weapon_mp9 float64 t_weapon_mp9 float64 ct_weapon_negev float64 t_weapon_negev float64 ct_weapon_nova float64 t_weapon_nova float64 ct_weapon_p90 float64 t_weapon_p90 float64 ct_weapon_r8revolver float64 t_weapon_r8revolver float64 ct_weapon_sawedoff float64 t_weapon_sawedoff float64 ct_weapon_scar20 float64 t_weapon_scar20 float64 ct_weapon_sg553 float64 t_weapon_sg553 float64 ct_weapon_ssg08 float64 t_weapon_ssg08 float64 ct_weapon_ump45 float64 t_weapon_ump45 float64 ct_weapon_xm1014 float64 t_weapon_xm1014 float64 ct_weapon_deagle float64 t_weapon_deagle float64 ct_weapon_fiveseven float64 t_weapon_fiveseven float64 ct_weapon_usps float64 t_weapon_usps float64 ct_weapon_p250 float64 t_weapon_p250 float64 ct_weapon_p2000 float64 t_weapon_p2000 float64 ct_weapon_tec9 float64 t_weapon_tec9 float64 ct_grenade_hegrenade float64 t_grenade_hegrenade float64 ct_grenade_flashbang float64 t_grenade_flashbang float64 ct_grenade_smokegrenade float64 t_grenade_smokegrenade float64 ct_grenade_incendiarygrenade float64 t_grenade_incendiarygrenade float64 ct_grenade_molotovgrenade float64 t_grenade_molotovgrenade float64 ct_grenade_decoygrenade float64 t_grenade_decoygrenade float64 round_winner object ct_equipment_cost float64 t_equipment_cost float64 equipment_difference float64 ct_equipment_per_player float64 t_equipment_per_player float64 score_difference float64 money_difference float64 dtype: object
Possiamo quindi procedere alla visualizzazione delle distribuzioni di alcuni attributi, in relazione al valore della target feature. Gli attributi in questione riguardano:
Vi sono varie ipotesi che dobbiamo andare a confermare durante la nostra ricerca, in particolar modo, crediamo esista un qualche legame tra il fatto che la bomba sia stata piazzata e la percentuale di vittoria tra le due squadre, nel caso in cui i T riescono a prendere il controllo del sito della bomba, riusciranno con maggiore probabilità a vincere il round.
Il numero di granate esplosive potrebbe anche essere un indice di come i T stiano gestendo la loro economia, i.e., un elevato numero di granate implicherebbe una maggiore disponibilità di denaro, e perciò, una maggiore probabilità di vittoria.
Per quanto concerne le mappe, invece, è possibile che esse non siano totalmente bilanciate, ovvero, esistono mappe che favoriscono l'una o l'altra squadra; vorremmo quindi indagare questo aspetto più nel dettaglio per verificare se tali ipotesi sono confermate.
A favorire l'una o l'altra parte potrebbe anche essere il punteggio attuale, se infatti lo score di una squadra è molto più alto dell'altra, allora, tale tendenza potrebbe riconfermarsi nel round attuale.
Simile discorso è replicabile per ciò che riguarda la qualità dell'equipaggiamento.
Un ulteriore aspetto da analizzare è invece l'impatto che l'armatura può avere sullo svolgimento di una partita, è evidente che ciò possa aiutare, ma in che misura?
important=['bomb_planted']
plot_distributions(important)
plot_distributions(['t_grenade_hegrenade','ct_grenade_hegrenade'])
plot_distributions(['ct_score','t_score','score_difference'])
plot_distributions(['time_left'])
plot_distributions(['t_armor','ct_armor'])
plot_distributions(['map'])
plot_distributions(['ct_money', 't_money'])
plot_distributions(['ct_equipment_cost','t_equipment_cost','equipment_difference'])
plot_distributions(['ct_equipment_per_player','t_equipment_per_player'])
Le informazioni emerse da questi grafici possono essere ulteriormente indagate.
Verifichiamo innanzitutto chi spende di più per il proprio equipaggiamento
plt.figure(figsize=(6,6))
total_equip_ct=ds['ct_equipment_cost'].sum()
total_equip_t=ds['t_equipment_cost'].sum()
values=[total_equip_t,total_equip_ct]
labels=['CT','T']
y=np.array(values)
plt.pie(y, labels = labels,autopct='%1.1f%%',explode=[0,0],shadow=True, startangle=90)
plt.title('Total equipment cost')
plt.show()
Dal pie chart sovrastante è evidente che, in totale, i T tendono ad avere equipaggiamenti più costosi Vediamo quindi come tale denaro viene distribuito.
plt.figure(figsize=(10,5))
sns.kdeplot(ds['ct_equipment_cost'], color="b", shade = True,label='CT')
_=sns.kdeplot(ds['t_equipment_cost'],color="r",shade=False,label='T')
plt.xlabel("Equipment")
plt.ylabel("Frequency")
plt.title('Equipment distribution',size = 20)
Text(0.5, 1.0, 'Equipment distribution')
Si può notare che i T tendono ad avere equipaggiamenti mediamente meno costosi, ma con una costanza maggiore. I CT riescono ad ottenere equipaggiamenti il cui costo totale rasenta i 30.000$ ma con una frequenza più bassa.
E' possibile ipotizzare che, in corrispondenza dei picchi di acquisto dei CT questi si trovino in una fase della partita in cui la loro economia è ben costruita, i.e., stanno vincendo da molti round. Possiamo verificare questo focalizzandoci, ad esempio, sul range di valori tra 20.000$ e 35.000$ e verificando quale sia la media di score tra le due squadre.
ds_sub=ds[(ds['ct_equipment_cost']>=20000) & (ds['ct_equipment_cost']<=35000)]
plt.bar(['CT', 'T'], [ds_sub['ct_score'].mean(), ds_sub['t_score'].mean()])
plt.title('Mean score for a specific range of rows')
plt.show()
Evidentemente, per quel range, i CT hanno una media di punteggio maggiore. In generale infatti la media dei punteggi risulta essere abbastanza equilibrata, come dimostra il seguente grafico.
plt.bar(['CT', 'T'], [ds['ct_score'].mean(), ds['t_score'].mean()])
plt.title('Mean score')
plt.show()
Abbiamo quindi verificato come le squadre usino i loro soldi, ma come si comportano nel "risparmio" di tali fondi, ovvero, tra un round e l'altro, di quanto denaro dispone ogni squadra?
plt.figure(figsize=(10,5))
sns.kdeplot(ds['ct_money'], color="b", shade = True,label='CT')
_=sns.kdeplot(ds['t_money'],color='r',shade=False,label='T')
plt.xlabel("Money")
plt.ylabel("Frequency")
plt.title('Money distribution',size = 20)
Text(0.5, 1.0, 'Money distribution')
Dal grafico si evince che i T tendono a spendere di meno dei CT, o comunque, hanno una maggiore disponibilità di denaro, che si traduce in equipaggiamenti mediamente più costosi, e più fondi risparmati. Come possiamo spiegare uno scenario del genere?
E' possibile che, dovendo i T svolgere ruolo attivo nello svolgimento di un round, costretti quindi ad attaccare le posizioni dei CT, per evitare che i difensori abbiano, oltre al vantaggio della posizione difensiva anche quello economico, il gioco tenda a rifornire i T con una maggiore quantità di denaro.
Le tuple del dataset presentano inoltre una forte frequenza per snapshot effettuati a inizio partita, il che potrebbe spiegarsi nel fatto che in alcuni casi, i ruond possono terminare anzitempo, per l'eliminazione totale della squadra avversaria.
plt.figure(figsize=(10,5))
sns.histplot(ds['time_left'], color="b")
plt.xlabel("Time")
plt.ylabel("Frequency")
plt.title('Time distribution',size = 20)
Text(0.5, 1.0, 'Time distribution')
Dopo aver trattato del lato economico, passiamo ora all'uso delle armi
def plot_weapon(weapons,title):
plt.figure(figsize=(20,10))
tmp=ds[weapons].sum().sort_values(ascending=False)
tmp2=ps.DataFrame(tmp,columns=['value'])
tmp2['weapon']=tmp2.index
tmp2=tmp2[tmp2['value']>1500]
p = sns.barplot(x = tmp2['weapon'], y = tmp2['value'])
p = plt.xticks(rotation=90)
plt.title("Weapon usage "+title, size=20)
def plot_weapon_based_map(weapons,map,title):
plt.figure(figsize=(20,10))
tmp=ds[ds['map']==map][weapons].sum().sort_values(ascending=False)
tmp2=ps.DataFrame(tmp,columns=['value'])
tmp2['weapon']=tmp2.index
tmp2=tmp2[tmp2['value']>1500]
p = sns.barplot(x = tmp2['weapon'], y = tmp2['value'])
p = plt.xticks(rotation=90)
plt.title("Weapon usage "+title, size=20)
In primo luogo, verifichiamo quali siano le armi più utilizzate per CT e T
plot_weapon(ct_weapon,'CT')
plot_weapon(t_weapon,'T')
Entrambe le squadre tendono a prediligere la classe dei "rifles", con un particolare interesse poi per fucili di precisione come l'awp e pistole come la desert eagle, le quali, in effetti, sono considerate tra le armi con un rateo costi benefici migliore.
Per quanto riguarda l'uso delle utilities (granate) si può mettere in evidenza un largo utilizzo delle flashbang (utilizzate con maggiore frequenza probabilmente per la loro versatilità e costo relativo abbastanza basso) e delle granate fumogene (che insieme a quelle incendiarie rappresentano uno degli strumenti strategici più importanti per le squadre)
plot_weapon(ct_utilities,' CT utilities')
plot_weapon(t_utilities,'T utilities')
Possiamo fare la seguente ipotesi, sebbene per i T, l'awp (fucile di precisione) sia in generale la quarta arma preferita, è possibile che in base al contesto questa relazione possa cambiare. Consideriamo quindi una mappa con spazi più aperti, che fornisce ai T una posizione di vantaggio già dall'inizio della partita (sniper rifles) come "dust 2". Possiamo supporre che questo aspetto strategico possa avere poi delle conseguenze sull'uso delle armi.
Image(url='https://static.wikia.nocookie.net/cswikia/images/4/40/De_dust2_radar.png/revision/latest?cb=20201206162746')
plot_weapon_based_map(t_weapon,'de_dust2','T on map Dust 2')
In effetti, possiamo verificare che in questo contesto particolare, l'awp passa dalla quarta alla terza posizione. Questa situazione si verifica in altre mappe? Prendiamo in esame "nuke".
plot_weapon_based_map(t_weapon,'de_nuke',' T on map Nuke')
In questo caso invece, l'awp si sposta in quinta posizione, ciò potrebbe essere dovuto alla forte posizione, soprattutto per eventuali cecchini, che la mappa offre ai CT. Ciò rende questa mappa di minore utilità per un awp terrorista.
Per quanto riguarda l'uso generale delle utilities, notiamo un quasi perfetto equilibrio. Le granate servono in egual misura ai CT per difendersi e ai T per attaccare e conquistare il sito della bomba.
plt.figure(figsize=(6,6))
total_utilities_ct=ds[ct_utilities].sum()
total_utilities_t=ds[t_utilities].sum()
values=[total_utilities_ct.values.sum(),total_utilities_t.values.sum()]
labels=['CT','T']
y=np.array(values)
print(y)
plt.pie(y, labels = labels,autopct='%1.1f%%',explode=[0,0],shadow=True, startangle=90)
plt.title('Total utilities usage')
plt.show()
[657433. 651364.]
Anche nella quantità totale di armi non si nota nulla di particolare; l'equilibrio è quasi totale.
plt.figure(figsize=(6,6))
total_weapon_ct=ds[ct_weapon].sum()
total_weapon_t=ds[t_weapon].sum()
values=[total_weapon_ct.values.sum(),total_weapon_t.values.sum()]
labels=['CT','T']
y=np.array(values)
print(y)
plt.pie(y, labels = labels,autopct='%1.1f%%',explode=[0,0],shadow=True, startangle=90)
plt.title('Total weapon')
plt.show()
[449434. 451666.]
Ritornando invece alla questione economica, i T presentano una maggiore quantità di denaro in loro possesso, come avevamo già ipotizzato.
plt.figure(figsize=(6,6))
total_money_ct=ds['ct_money'].sum()
total_money_t=ds['t_money'].sum()
values=[total_money_ct,total_money_t]
labels=['CT','T']
y=np.array(values)
plt.pie(y, labels = labels,autopct='%1.1f%%',explode=[0,0],shadow=True, startangle=90)
plt.title('Total money')
plt.show()
Vediamo ora come variano le preferenze di acquisto tra le due squadre, ovvero, quali sono le categorie più popolari e quali le differeze tra T e CT?
fig, (ax1,ax2,ax0) = plt.subplots(1,3,figsize=(20,20))
plt.figure(figsize=(10,10))
values=[]
labels=['rifles','sniper rifles','smg','mg & shotguns','pistols']
for x in weapon_classes:
values.append(ds[x].sum().values.sum())
explode=np.zeros(5)
y=np.array(values)
ax0.pie(y, labels = labels,autopct='%1.1f%%',explode=explode,shadow=True, startangle=90)
ax0.set_title('Weapons distribution')
values=[]
#labels=['rifles','sniper rifles','smg','machine guns & shotguns','pistols']
for x in weapon_classes_ct:
values.append(ds[x].sum().values.sum())
explode=np.zeros(5)
y=np.array(values)
ax1.set_title('Weapons distribution CT')
ax1.pie(y, labels = labels,autopct='%1.1f%%',explode=explode,shadow=True, startangle=90)
#plt.figure(figsize=(10,10))
values=[]
#labels=['rifles','sniper rifles','smg','machine guns & shotguns','pistols']
for x in weapon_classes_t:
values.append(ds[x].sum().values.sum())
explode=np.zeros(5)
y=np.array(values)
ax2.pie(y, labels = labels,autopct='%1.1f%%',explode=explode,shadow=True, startangle=90)
ax2.set_title('Weapons distribution T')
Text(0.5, 1.0, 'Weapons distribution T')
<Figure size 720x720 with 0 Axes>
Considerando la distribuzione totale, si nota un elevato numero di rifles, con le pistole in seconda posizione, i fucili di precisione a seguire, le smg, e infine le machine guns e i fucili a pompa. Tale distribuzione si mantiene pressoché invariata tra T e CT, sebbene si possa notare qualche differenza. In particolare, i CT tendono ad acquistare più fucili di precisione rispetto ai T. Tale fenomeno è da ricondursi al ruolo svolto dai CT (i difensori, che quindi, avendo posizioni difensive possono permettersi armi molto più statiche come i fucili di precisione) e dai T (gli attaccanti, che quindi devono prediligere armi più dinamiche)
Ci chiediamo adesso come influenzi la presenza del kit di disinnesco in quei casi in cui la bomba è stata piazzata, l'assunzione che possiamo effettuare è che ovviamente, in tali situazioni, la probabilità di successo dei CT sia molto più alta.
print(ds[(ds['bomb_planted']==True) & (ds['ct_defuse_kits']>0)].shape[0]/ds[(ds['bomb_planted']==True) ].shape[0])
0.5700087693656826
Solo nel 57% dei casi in cui la bomba è stata piazzata vi è almeno un kit per il disinnesco
wins_kit=ds[(ds['bomb_planted']==True) & (ds['ct_defuse_kits']>0) & (ds[target_feature]=='CT')].shape[0]
wins_no_kit=ds[(ds['bomb_planted']==True) & (ds['ct_defuse_kits']==0) & (ds[target_feature]=='CT')].shape[0]
lost_kit=ds[(ds['bomb_planted']==True) & (ds[target_feature]=='T') & (ds['ct_defuse_kits']>0)].shape[0]
lost_no_kit=ds[(ds['bomb_planted']==True) & (ds[target_feature]=='T') & (ds['ct_defuse_kits']==0) ].shape[0]
plt.figure(figsize=(6,6))
labels = 'Wins with kit','Wins no kit','Lost kit','Lost no kit'
sizes = [wins_kit,wins_no_kit,lost_kit,lost_no_kit]
explode = (0, 0,0,0)
fig1, ax1 = plt.subplots()
p = ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Win percentage", size=20)
plt.show()
<Figure size 432x432 with 0 Axes>
wins_kit/(wins_kit+wins_no_kit)
0.7548387096774194
(wins_kit+wins_no_kit)/(wins_no_kit+wins_kit+lost_kit+lost_no_kit)
0.22654194679918152
Notiamo quindi che la presenza dei kit per il disinnesco aiuta i CT a vincere, difatti, sul totale delle vittorie (considerando il caso in cui la bomba è stata piazzata, la probabilità di vittoria sulle vittorie totali è del 75%).
L'uso dei kit è quindi reltivamente importante per garantire la vittoria dei CT, sebbene tale peso non sia così forte come magari supposto, difatti, solo nello 0.22% delle volte in cui la bomba viene piazzata si verifica la vittoria degli anti-terroristi.
Infine, provvediamo a graficare la distribuzione sull'uso dei kit di disinnesco
plot_distributions(['ct_defuse_kits'])
Sono mostrati, di seguito, dei grafici che mettono assieme alcuni attributi per verificare la presenza di possibili relazioni tra essi.
Parte dell'armatura di un giocatore è determinata dall'elmetto, i giocatori hanno infatti la possibilità di acquistare kevlar più elmetto; tali due attributi potrebbero avere una forte correlazione.
plt.figure(figsize=(20,10))
p = sns.jointplot(y = 't_helmets', x = 't_armor', data = ds)
p = plt.xticks(rotation=90)
<Figure size 1440x720 with 0 Axes>
Difatti, se il valore del t_harmor è troppo basso, non vi saranno molti elemetti disponibili in squadra
Simile discorso può essere fatto considerando il valore totale della vita e il numero di giocatori attualmente vivi; in più possiamo vedere come questi attributi sono legati alla percentuale di vittoria delle due squadre
plt.figure(figsize=(20,10))
p = sns.jointplot(x = 'ct_health', y = 'ct_players_alive', data = ds,hue=target_feature)
p = plt.xticks(rotation=90)
<Figure size 1440x720 with 0 Axes>
La correlazione è evidente, più giocatori sono in vita, maggiore sarà la vita totale del team, in particolare, più tali attributi sono alti, più la probabilità di vittoria dei CT è alta. Discorso analogo potrebbe essere fatto considerando i T
Per quanto concerne invece il punteggio dei T e CT messi a confronto, è evidente dal grafico sottostante che la maggior parte dei round viene giocata con punteggi equilibrati e con valori compresi tra 0 e 15, i valori superiori corrispondono ad eventuali tempi supplementari. Non si nota nulla di particolare per quanto concerne la correlazione tra le features. Difatti, i tempi supplementari vengono giocati alla meglio di 6, che possono essere reiterati finché una delle due squadre non ha il sopravento sull'altra.
Possiamo spiegare così la forma del grafico sottostante una volta che la soglia dei 15 round per squadra viene superata, affinché si possa continuare nei tempi supplementari lo score trai due team non può differire di molto.
plt.figure(figsize=(20,10))
p = sns.jointplot(x = 't_score', y = 'ct_score', data = ds,hue=target_feature)
p = plt.xticks(rotation=90)
<Figure size 1440x720 with 0 Axes>
Vogliamo ora porre la nostra indagine sul rapporto tra l'armatura totale dei ct e l'equipaggiamento totale. E' evidente che per ottenere valori di ct_armor elevati servono anche valori elevati di ct_equipment_cost. Inoltre, a riconferma sull'importanza del valore totale dell'armatura, più ci concentriamo su valori bassi, più la percentuale di vittorie è in favore dei T, mentre più aumenta più si assiste ad un cambiamento in favore dei CT.
plt.figure(figsize=(20,10))
p = sns.jointplot(x = 'ct_armor', y = 'ct_equipment_cost', data = ds,hue=target_feature)
p = plt.xticks(rotation=90)
<Figure size 1440x720 with 0 Axes>
Concentriamoci ora sul rapporto tra il costo dell'equipaggiamento e il numero di ak47, l'arma più popolare tra i T. E' evidente una certa correlazione tra i due attributi, inoltre, maggiore è il numero di ak, maggiore è la concentrazione di vittore dei terroristi.
plt.figure(figsize=(8,8))
p = sns.scatterplot(x = 't_equipment_cost', y = 't_weapon_ak47', data = ds,hue=target_feature)
p = plt.xticks(rotation=90)
Considerando invece la relazione tra la differenza di punteggio e quella di equipaggiamento è evidente che la seconda sia più importante della prima, difatti, nel caso in cui questa è in favore dei T, la loro percentuale di vittoria è decisamente più alta. Simile discorso non può essere fatto per la differenza di score.
plt.figure(figsize=(8,8))
p = sns.scatterplot(y = 'score_difference', x = 'equipment_difference', data = ds,hue=target_feature)
p = plt.xticks(rotation=90)
Passiamo ora all'analisi relativa alle mappe. Vediamo innanzitutto come si presenta la distribuzione
plt.figure(figsize=(8,8))
values=[]
labels=ds['map'].unique()
for x in labels:
values.append((ds[ds['map']==x].shape[0]))
y=np.array(values)
explode=np.zeros(labels.shape[0])
plt.pie(y, labels = labels,autopct='%1.1f%%',explode=explode,shadow=True, startangle=90)
plt.title('Maps distribution')
plt.show()
La mappa più giocata è "inferno" mentre la meno giocata è "cache".
Infine, viene visualizzata la percentuale totale di vittoria tra T e CT
plt.figure(figsize=(6,6))
labels = 'CT','T'
sizes = ds[target_feature].value_counts()
explode = (0, 0)
fig1, ax1 = plt.subplots()
p = ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal')
plt.title("Win percentage", size=20)
plt.show()
<Figure size 432x432 with 0 Axes>
I valori della target class sono quasi totalmente bilanciati, si nota soltanto un piccolo sbilanciamento verso i CT, che hanno una percentuale di vittoria del 51%
Diamo ora uno sguardo più generale alla correlazione trai vari attributi, grafichiamo la matrice di correlazione.
plt.figure(figsize=(15,15))
corr_matrix=ds.corr()
_=sns.heatmap(corr_matrix,cmap='PiYG')
La matrice non presenta troppo spesso zone di forte correlazione, esistono naturalmente attributi critici, che verranno trattati nella fase di pre-processing.
Per una prima analisi, può essere visualizzata la scatter matrix relativa alla differenza nell'equipaggiamento, il valore di vita dei giocatori e il tempo rimanente.
scatter_matrix(ds[['equipment_difference' , 'ct_equipment_cost','t_equipment_cost',
'ct_health','t_health' ,'time_left'
]], figsize=(15,15),alpha=0.2)
plt.show()
Qui possiamo vedere con maggiore chiarezza la correlazione tra la differenza di equipaggiamento e costo
Possiamo anche visualizzare eventuali correlazioni tra il costo degli equipaggiamenti e l'armatura delle squadre
pp=sns.pairplot(ds,x_vars=['t_equipment_cost', 't_armor',
'ct_equipment_cost', 'ct_armor'],
y_vars=['t_equipment_cost', 't_armor',
'ct_equipment_cost', 'ct_armor'],
hue=target_feature,
diag_kind='kde',
diag_kws={'alpha':0.2})
pp.fig.set_size_inches(24,12)
E' evidente che alcuni attributi presentino delle correlazioni, tale aspetto verrà trattato e risolto nella sezione successiva.
Dall'analisi dei dati è inoltre emersa la presenta di un outliers in corrispondenza del numero di giocatori in vita (il range di riferimento è infatti [0,5]), non conosciamo la natura di tale evento, potrebbe trattarsi di un semplice errore, o qualcosa di utile per le nostre analisi, per sicurezza, tale tupla verrà eliminata durante la fase di trasformazione.
print(len(ds[ds['t_players_alive']>5]))
print(len(ds[ds['ct_players_alive']>5]))
1 0
Vi è una sola tupla che presenti tali caratteristiche
Andiamo alla ricerca di ulteriori outlier, verificando le condizioni al limite. Verifichiamo quante siano le vittorie dei T, quando nessuno del team è più in vita e la bomba non è stata piazzata (caso non possibile)
ds[(ds['t_players_alive']==0) & (ds['round_winner']=='T') & (ds['bomb_planted']==False)]
| time_left | ct_score | t_score | map | bomb_planted | ct_health | t_health | ct_armor | t_armor | ct_money | ... | ct_grenade_decoygrenade | t_grenade_decoygrenade | round_winner | ct_equipment_cost | t_equipment_cost | equipment_difference | ct_equipment_per_player | t_equipment_per_player | score_difference | money_difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 29725 | 167.0 | 1.0 | 6.0 | de_inferno | False | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | T | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -5.0 | 0.0 |
1 rows × 104 columns
Anche questa tupla dovrà essere filtrata in fase di trasformazione, in quanto non è possibile che la squadra dei T vinca quando la bomba non è stata piazzata e i il team è stato eliminato
Verifichiamo ora, se esistono tuple con CT vincitori ma nessun giocatore del team in vita.
ds[(ds['ct_players_alive']==0) & (ds['round_winner']=='CT')]
| time_left | ct_score | t_score | map | bomb_planted | ct_health | t_health | ct_armor | t_armor | ct_money | ... | ct_grenade_decoygrenade | t_grenade_decoygrenade | round_winner | ct_equipment_cost | t_equipment_cost | equipment_difference | ct_equipment_per_player | t_equipment_per_player | score_difference | money_difference |
|---|
0 rows × 104 columns
Per quanto concerne invece il lato CT non notiamo alcuna tupla sospetta
Diamo uno sguardo ad alcuni dati statistici provenienti dal dataframe
ds.describe()
| time_left | ct_score | t_score | ct_health | t_health | ct_armor | t_armor | ct_money | t_money | ct_helmets | ... | t_grenade_molotovgrenade | ct_grenade_decoygrenade | t_grenade_decoygrenade | ct_equipment_cost | t_equipment_cost | equipment_difference | ct_equipment_per_player | t_equipment_per_player | score_difference | money_difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | ... | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 | 122410.000000 |
| mean | 97.886922 | 6.709239 | 6.780435 | 412.106568 | 402.714500 | 314.142121 | 298.444670 | 9789.023773 | 11241.036680 | 2.053901 | ... | 1.352095 | 0.027694 | 0.025750 | 12311.216812 | 10656.114697 | 1655.102116 | 2943.413596 | 2551.048171 | -0.071195 | -1452.012907 |
| std | 54.465238 | 4.790362 | 4.823543 | 132.293290 | 139.919033 | 171.029736 | 174.576545 | 11215.042286 | 12162.806759 | 1.841470 | ... | 1.663246 | 0.169531 | 0.164162 | 8765.859741 | 7423.491910 | 9482.794785 | 1830.565204 | 1572.822956 | 4.153461 | 13104.484044 |
| min | 0.010000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -26850.000000 | 0.000000 | 0.000000 | -15.000000 | -65950.000000 |
| 25% | 54.920000 | 3.000000 | 3.000000 | 350.000000 | 322.000000 | 194.000000 | 174.000000 | 1300.000000 | 1550.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 3600.000000 | 2800.000000 | -4000.000000 | 940.000000 | 700.000000 | -3.000000 | -7600.000000 |
| 50% | 94.910000 | 6.000000 | 6.000000 | 500.000000 | 500.000000 | 377.000000 | 334.000000 | 5500.000000 | 7150.000000 | 2.000000 | ... | 1.000000 | 0.000000 | 0.000000 | 12200.000000 | 11250.000000 | 700.000000 | 3516.666667 | 3150.000000 | 0.000000 | -400.000000 |
| 75% | 166.917500 | 10.000000 | 10.000000 | 500.000000 | 500.000000 | 486.000000 | 468.000000 | 14600.000000 | 18000.000000 | 4.000000 | ... | 2.000000 | 0.000000 | 0.000000 | 20250.000000 | 17350.000000 | 7400.000000 | 4500.000000 | 3850.000000 | 3.000000 | 4400.000000 |
| max | 175.000000 | 32.000000 | 33.000000 | 500.000000 | 600.000000 | 500.000000 | 500.000000 | 80000.000000 | 80000.000000 | 5.000000 | ... | 5.000000 | 3.000000 | 2.000000 | 34400.000000 | 30350.000000 | 29200.000000 | 7250.000000 | 6350.000000 | 14.000000 | 72000.000000 |
8 rows × 101 columns
Oltre agli outlier relativi al numero di giocatori in vita, vi sono altre features sospette riguardanti il numero di armi.
ds[ds['t_weapon_cz75auto']>5]['t_weapon_cz75auto']
98707 6.0 Name: t_weapon_cz75auto, dtype: float64
len(ds[ds['t_weapon_glock']>5])
14
len(ds[ds['ct_weapon_usps']>5])
15
E' presente una tupla in cui l'arma cz75auto è posseduta da 6 giocatori per i T, 14 tuple dove invece vi sono più di 5 giocatori che possiedono una glock e altre 15 in cui più di 15 CT hanno una usps. Si tratta in tutti i casi di pistole, tuttavia, è impossibile che una qualsiasi arma sia presente per più di 5 volte in uno snapshot (per team) essendo la squadra composta da al più 5 giocatori e potendo, ogni giocatore, portare al più un'arma primaria e una pistola. Anche queste tuple dovranno essere filtrate.
Verifichiamo in generale quali sono i casi in cui il numero di armi supera quello di giocatori in vita, sarà quindi necessario suddividere la visualizzazione tra le armi primarie e quelle secondarie.
def check_weapons_outliers(columns,players):
tmp=[ds[columns].sum(axis=1),ds[players]]
headers=['weapons_sum',players]
tmp_ds=ps.concat(tmp,axis=1,keys=headers)
print(len(tmp_ds[tmp_ds['weapons_sum']>tmp_ds[players]]))
check_weapons_outliers(primary_ct,'ct_players_alive')
0
check_weapons_outliers(secondary_ct,'ct_players_alive')
57
check_weapons_outliers(primary_t,'t_players_alive')
0
check_weapons_outliers(secondary_t,'t_players_alive')
80
Abbiamo quindi individuato qualche tupla anomala, che verrà gestita nella prossima fase
Tornando invece alla matrice di correlazione, possiamo indagare meglio quale siano gli attributi che più hanno dipendenza rispetto al campo relativo al costo dell'equipaggiamento, dal momento che già la rappresentazione grafica ci aveva permesso di fare qualche ipotesi a riguardo
corr_matrix['t_equipment_cost'].sort_values(ascending=False)
t_equipment_cost 1.000000 t_helmets 0.934230 t_equipment_per_player 0.864552 t_armor 0.825925 t_grenade_flashbang 0.819111 t_grenade_smokegrenade 0.804667 t_grenade_molotovgrenade 0.781154 t_weapon_ak47 0.665206 t_weapon_sg553 0.578258 t_weapon_awp 0.567644 t_grenade_hegrenade 0.494794 ct_weapon_m4a4 0.351979 ct_equipment_cost 0.322906 t_players_alive 0.309776 ct_grenade_smokegrenade 0.301454 ct_grenade_flashbang 0.294086 round_winner 0.280764 ct_equipment_per_player 0.277681 ct_grenade_incendiarygrenade 0.269944 t_health 0.256400 ct_grenade_hegrenade 0.242212 ct_armor 0.229308 ct_weapon_awp 0.209674 t_money 0.189660 t_score 0.168231 ct_weapon_deagle 0.165783 t_weapon_glock 0.163695 t_weapon_m4a4 0.124116 t_weapon_p250 0.111747 ct_helmets 0.111657 ct_weapon_p250 0.108038 ct_defuse_kits 0.102053 ct_weapon_m4a1s 0.095739 ct_health 0.094005 t_weapon_galilar 0.093795 t_grenade_incendiarygrenade 0.090703 ct_score 0.089592 ct_weapon_cz75auto 0.088024 ct_players_alive 0.084240 ct_weapon_fiveseven 0.082397 ct_weapon_aug 0.081438 ct_weapon_famas 0.067823 t_weapon_usps 0.061687 t_weapon_fiveseven 0.060639 t_weapon_mac10 0.057005 t_weapon_aug 0.048151 t_weapon_g3sg1 0.042887 t_weapon_ump45 0.041834 ct_weapon_mag7 0.039305 ct_weapon_ssg08 0.037608 ct_grenade_decoygrenade 0.035183 t_weapon_m4a1s 0.034257 t_weapon_elite 0.029229 t_weapon_tec9 0.026982 ct_weapon_xm1014 0.025430 t_grenade_decoygrenade 0.025384 t_weapon_famas 0.020692 ct_weapon_ump45 0.019441 ct_weapon_nova 0.018488 t_weapon_deagle 0.015612 t_weapon_mp5sd 0.011429 t_weapon_mp7 0.009381 ct_weapon_scar20 0.008526 t_weapon_p90 0.008421 t_weapon_p2000 0.006019 t_weapon_cz75auto 0.005718 ct_weapon_p90 0.005625 ct_weapon_m249 0.003906 t_weapon_bizon 0.002495 ct_weapon_mp9 0.002142 t_weapon_negev 0.001788 t_weapon_scar20 0.000498 t_weapon_mag7 -0.001051 t_weapon_nova -0.001055 t_weapon_xm1014 -0.001521 ct_weapon_mp7 -0.001943 t_weapon_sawedoff -0.003241 t_weapon_mp9 -0.006317 t_weapon_r8revolver -0.006577 ct_weapon_mp5sd -0.006900 ct_weapon_tec9 -0.009789 t_weapon_ssg08 -0.016073 ct_weapon_elite -0.017631 ct_weapon_p2000 -0.023130 ct_weapon_glock -0.026935 ct_weapon_mac10 -0.036374 ct_weapon_galilar -0.045534 ct_weapon_sg553 -0.049070 ct_grenade_molotovgrenade -0.050246 ct_money -0.064672 ct_weapon_ak47 -0.078648 bomb_planted -0.083538 ct_weapon_usps -0.090726 score_difference -0.092041 time_left -0.180196 money_difference -0.231379 equipment_difference -0.484345 ct_weapon_bizon NaN ct_weapon_g3sg1 NaN t_weapon_m249 NaN ct_weapon_negev NaN ct_weapon_r8revolver NaN ct_weapon_sawedoff NaN Name: t_equipment_cost, dtype: float64
corr_matrix['ct_equipment_cost'].sort_values(ascending=False)
ct_equipment_cost 1.000000 ct_grenade_flashbang 0.868431 ct_grenade_smokegrenade 0.804697 ct_armor 0.792333 ct_defuse_kits 0.781452 ct_helmets 0.780664 ct_grenade_incendiarygrenade 0.745160 ct_weapon_m4a4 0.705434 ct_weapon_awp 0.700686 ct_grenade_hegrenade 0.682121 equipment_difference 0.671613 ct_players_alive 0.327241 t_grenade_molotovgrenade 0.327161 t_equipment_cost 0.322906 ct_weapon_sg553 0.319135 t_grenade_smokegrenade 0.317414 ct_weapon_ak47 0.311690 ct_health 0.304638 t_helmets 0.301765 t_grenade_flashbang 0.297763 t_armor 0.248662 t_weapon_ak47 0.245449 ct_weapon_aug 0.242551 ct_weapon_p250 0.222511 ct_score 0.215838 ct_money 0.203549 t_weapon_sg553 0.188495 ct_weapon_m4a1s 0.184725 t_weapon_deagle 0.162283 t_weapon_awp 0.152033 ct_grenade_molotovgrenade 0.145975 ct_weapon_usps 0.139587 score_difference 0.126518 t_health 0.116695 t_grenade_hegrenade 0.112618 t_weapon_cz75auto 0.111459 t_score 0.105411 t_players_alive 0.102503 ct_weapon_tec9 0.077287 ct_weapon_famas 0.076499 t_weapon_tec9 0.069766 ct_grenade_decoygrenade 0.054904 ct_weapon_mp9 0.046376 ct_weapon_deagle 0.045906 ct_weapon_scar20 0.034328 ct_weapon_xm1014 0.028296 ct_weapon_p2000 0.024524 ct_weapon_glock 0.022528 ct_weapon_ump45 0.021985 t_weapon_g3sg1 0.017566 ct_weapon_galilar 0.016441 t_weapon_p250 0.015028 t_weapon_sawedoff 0.013604 t_weapon_r8revolver 0.008605 t_grenade_decoygrenade 0.008256 t_weapon_elite 0.008006 ct_weapon_fiveseven 0.007940 t_money 0.006630 ct_weapon_p90 0.006101 t_weapon_nova 0.005938 t_weapon_glock 0.005895 t_weapon_negev 0.005670 ct_weapon_m249 0.005181 ct_weapon_elite 0.005122 ct_weapon_mp5sd 0.004797 t_weapon_p90 0.004420 ct_weapon_mp7 0.004311 t_weapon_ssg08 0.003471 t_weapon_bizon -0.000532 t_weapon_mag7 -0.002132 t_weapon_xm1014 -0.002716 t_weapon_scar20 -0.003688 t_weapon_galilar -0.005549 ct_weapon_cz75auto -0.013404 ct_weapon_nova -0.015917 t_weapon_m4a1s -0.017061 ct_weapon_mag7 -0.017669 t_weapon_aug -0.021947 ct_weapon_mac10 -0.024695 t_weapon_mp7 -0.026639 t_weapon_fiveseven -0.028547 t_grenade_incendiarygrenade -0.030403 t_weapon_mp5sd -0.036283 t_weapon_mp9 -0.043043 t_weapon_p2000 -0.045547 t_weapon_famas -0.048705 t_weapon_ump45 -0.057562 t_weapon_m4a4 -0.067366 time_left -0.070850 t_weapon_mac10 -0.083288 ct_weapon_ssg08 -0.099283 t_weapon_usps -0.160191 bomb_planted -0.218837 round_winner -0.311727 ct_weapon_bizon NaN ct_weapon_g3sg1 NaN t_weapon_m249 NaN ct_weapon_negev NaN ct_weapon_r8revolver NaN ct_weapon_sawedoff NaN Name: ct_equipment_cost, dtype: float64
Come mostrato dal grafico e dall'analisi sulla matrice di correlazione, esiste un legame abbastanza forte con altri attributi, circa 0.8, leggermente più evidente per i T
corr_matrix[target_feature].sort_values(ascending=False)
round_winner 1.000000 t_helmets 0.297458 t_armor 0.290753 t_equipment_cost 0.280764 t_weapon_ak47 0.194147 bomb_planted 0.187101 t_grenade_flashbang 0.166839 t_weapon_sg553 0.163709 t_weapon_awp 0.149878 t_players_alive 0.142518 t_grenade_smokegrenade 0.140348 t_weapon_usps 0.136694 t_grenade_molotovgrenade 0.116754 t_grenade_hegrenade 0.116336 t_money 0.098362 t_health 0.091361 t_weapon_m4a4 0.088002 t_weapon_mac10 0.087114 t_weapon_glock 0.069004 t_weapon_ump45 0.067326 t_weapon_galilar 0.062742 t_grenade_incendiarygrenade 0.050546 t_weapon_p250 0.045456 ct_weapon_ssg08 0.038482 t_weapon_fiveseven 0.037264 t_weapon_p2000 0.036279 t_weapon_famas 0.035858 t_weapon_mp5sd 0.032810 t_weapon_aug 0.031507 t_weapon_m4a1s 0.031000 t_score 0.026863 ct_weapon_fiveseven 0.024263 t_grenade_decoygrenade 0.023998 t_weapon_mp9 0.023775 ct_weapon_deagle 0.014515 ct_weapon_cz75auto 0.013852 t_weapon_elite 0.013352 ct_weapon_mag7 0.012895 t_weapon_mp7 0.012854 t_weapon_g3sg1 0.005960 t_weapon_p90 0.005658 ct_weapon_nova 0.004607 t_weapon_mag7 0.004531 t_weapon_bizon 0.003439 ct_weapon_m249 0.002803 ct_weapon_mp7 0.000020 t_weapon_xm1014 -0.001328 t_weapon_tec9 -0.001494 t_weapon_ssg08 -0.001628 t_weapon_sawedoff -0.002311 ct_weapon_mac10 -0.002529 t_weapon_scar20 -0.002915 t_weapon_negev -0.002915 ct_weapon_scar20 -0.002965 t_weapon_cz75auto -0.005177 t_weapon_nova -0.007339 t_weapon_r8revolver -0.007712 ct_weapon_p90 -0.009201 ct_weapon_xm1014 -0.009938 ct_weapon_mp5sd -0.011656 ct_weapon_elite -0.015853 ct_weapon_glock -0.021769 ct_grenade_decoygrenade -0.022678 ct_weapon_p2000 -0.023314 ct_weapon_ump45 -0.025873 ct_weapon_galilar -0.035162 t_weapon_deagle -0.035230 ct_weapon_tec9 -0.042672 ct_weapon_m4a1s -0.047719 ct_weapon_p250 -0.049614 ct_weapon_famas -0.049922 ct_score -0.057304 ct_weapon_mp9 -0.064231 time_left -0.068994 ct_grenade_molotovgrenade -0.079758 ct_weapon_aug -0.080578 score_difference -0.097288 ct_money -0.129326 ct_weapon_usps -0.152893 ct_weapon_sg553 -0.163963 ct_weapon_ak47 -0.166855 ct_grenade_incendiarygrenade -0.168517 ct_grenade_hegrenade -0.168781 ct_weapon_m4a4 -0.178008 ct_health -0.190662 ct_weapon_awp -0.198626 ct_grenade_smokegrenade -0.209975 ct_players_alive -0.216798 ct_grenade_flashbang -0.253868 ct_defuse_kits -0.291557 ct_helmets -0.308255 ct_equipment_cost -0.311727 ct_armor -0.336382 equipment_difference -0.507952 ct_weapon_bizon NaN ct_weapon_g3sg1 NaN t_weapon_m249 NaN ct_weapon_negev NaN ct_weapon_r8revolver NaN ct_weapon_sawedoff NaN Name: round_winner, dtype: float64
Inoltre, l'attributo target non ha una correlazione elevata con le altre features, il dataset è abbastanza bilanciato sotto questo punto di vista.
Possiamo graficare la distribuzione dei valori di alcuni attributi del dataframe.
plt.figure(figsize=(15,15))
ds[['equipment_difference','score_difference','round_winner','time_left','t_money','ct_money','map','t_health','ct_health',
't_armor','ct_armor','ct_players_alive','t_players_alive']].hist(figsize=(20,20))
plt.show()
<Figure size 1080x1080 with 0 Axes>
Si può notare una distribuzione simil-gaussiana per la differenza di equipaggiamento e score.
Tramite un semplice random forest possiamo inoltre verificare in prima persona quali potranno essere gli attributi più importanti del dataset.
from sklearn.model_selection import train_test_split
from sklearn.metrics import*
from sklearn.ensemble import RandomForestClassifier
from pandas.api.types import is_numeric_dtype
ds_copy=ds.copy()
for c in ds_copy.columns:
if not(is_numeric_dtype(ds_copy[c])) :
ds_copy[c] = Encoder_df.fit_transform(ds_copy[c])
ds_x=ds_copy.drop(target_feature,axis=1)
ds_y=ds_copy[target_feature]
X_train, X_test, y_train, y_test = train_test_split(ds_x,ds_y, random_state=42,stratify=ds_y)
rnd_class = RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42)
rnd_class.fit(X_train,y_train)
y_rnd_pred = rnd_class.predict(X_test)
attributes = ds_x.columns
importances = rnd_class.feature_importances_
index = np.argsort(importances)
plt.figure(figsize=(20,15))
plt.title("Attribute importance")
p = plt.barh(range(len(index)), importances[index], color='r', align='center')
plt.yticks(range(len(index)), attributes[index])
plt.xlabel("Relative importance")
plt.show()
Tra le features più importanti evidentemente rientrano aspetti di tipo economico, come la differenza di equipaggiamento, ma anche questione strategiche come la mappa in cui si svolge il round. Anche l'uso di alcune armi pare di grande importanza, soprattutto per ciò che concerne quelle più popolari come l'ak47 o l'm4a4
threshold=0.0006
valuable_features = np.extract(importances[index]>threshold,attributes[index])
valuable_features
array(['t_grenade_incendiarygrenade', 't_weapon_ssg08', 't_weapon_tec9',
't_weapon_ump45', 'ct_weapon_ssg08', 'ct_grenade_molotovgrenade',
'ct_grenade_decoygrenade', 't_weapon_m4a4',
't_grenade_decoygrenade', 'ct_weapon_ump45', 't_weapon_mac10',
'ct_weapon_fiveseven', 'ct_weapon_m4a1s', 't_weapon_galilar',
't_weapon_usps', 't_weapon_cz75auto', 't_weapon_awp',
'ct_weapon_famas', 'ct_weapon_mp9', 'ct_weapon_cz75auto',
'ct_weapon_aug', 'ct_weapon_sg553', 'ct_weapon_p250',
'ct_weapon_ak47', 'ct_weapon_awp', 't_weapon_deagle',
'ct_weapon_p2000', 't_weapon_p250', 'ct_weapon_deagle',
'bomb_planted', 't_grenade_hegrenade', 'ct_grenade_smokegrenade',
'ct_grenade_hegrenade', 'ct_weapon_m4a4', 't_weapon_sg553',
'ct_grenade_incendiarygrenade', 't_grenade_molotovgrenade',
't_grenade_smokegrenade', 't_weapon_glock', 'ct_weapon_usps',
'ct_players_alive', 't_weapon_ak47', 't_grenade_flashbang',
't_players_alive', 'ct_grenade_flashbang', 'ct_helmets', 't_score',
'ct_health', 'map', 'ct_score', 'score_difference',
'ct_defuse_kits', 't_health', 'time_left', 'ct_money', 't_helmets',
't_money', 't_equipment_cost', 'ct_equipment_per_player',
't_equipment_per_player', 'money_difference', 't_armor',
'ct_equipment_cost', 'ct_armor', 'equipment_difference'],
dtype=object)
valuable_features=['t_grenade_incendiarygrenade', 't_weapon_ssg08', 't_weapon_tec9',
't_weapon_ump45', 'ct_weapon_ssg08', 'ct_grenade_molotovgrenade',
'ct_grenade_decoygrenade', 't_weapon_m4a4',
't_grenade_decoygrenade', 'ct_weapon_ump45', 't_weapon_mac10',
'ct_weapon_fiveseven', 'ct_weapon_m4a1s', 't_weapon_galilar',
't_weapon_usps', 't_weapon_cz75auto', 't_weapon_awp',
'ct_weapon_famas', 'ct_weapon_mp9', 'ct_weapon_cz75auto',
'ct_weapon_aug', 'ct_weapon_sg553', 'ct_weapon_p250',
'ct_weapon_ak47', 'ct_weapon_awp', 't_weapon_deagle',
'ct_weapon_p2000', 't_weapon_p250', 'ct_weapon_deagle',
'bomb_planted', 't_grenade_hegrenade', 'ct_grenade_smokegrenade',
'ct_grenade_hegrenade', 'ct_weapon_m4a4', 't_weapon_sg553',
'ct_grenade_incendiarygrenade', 't_grenade_molotovgrenade',
't_grenade_smokegrenade', 't_weapon_glock', 'ct_weapon_usps',
'ct_players_alive', 't_weapon_ak47', 't_grenade_flashbang',
't_players_alive', 'ct_grenade_flashbang', 'ct_helmets', 't_score',
'ct_health', 'map', 'ct_score', 'score_difference',
'ct_defuse_kits', 't_health', 'time_left', 'ct_money', 't_helmets',
't_money', 't_equipment_cost', 'ct_equipment_per_player',
't_equipment_per_player', 'money_difference', 't_armor',
'ct_equipment_cost', 'ct_armor', 'equipment_difference',target_feature]
Dalla visualizzazione dei dati abbiamo tratto le seguenti conclusioni:
In questa sezione di procederà alla trasformazione dei dati, in vista del momento in cui dovranno essere introdotti e allenati i classificatori. Sappiamo che parte di tali operazioni è stata già effettuata nei paragrafi precedenti; il dataset è stato infatti arricchito con alcune colonne, ottenute introducendo informazioni esterne (il costo delle armi) oppure combinando features già in nostro possesso (come la differenza di score). Inoltre, per fini di visualizzazione il campo round_winner è stato già codificato (siamo passati dalle label 'CT' e 'T' a 0 e 1 rispettivamente.
Si vanno quindi a definire delle funzioni, al fine di modularizzare questa fase (la funzione per l'eliminazione di features altamente correlate è stata già introdotta nella prima parte del progetto), è possibile infatti adattare tali procedure anche con altri dataset.
def remove_unimportant_features(valuable_features,df):
to_be_removed=[]
for c in ds1.columns:
if(c not in valuable_features):
to_be_removed.append(c)
df.drop(columns=to_be_removed,axis=1,inplace=True)
def encode_categorical(ds):
for c in ds.columns:
if not(is_numeric_dtype(ds[c])) :
ds[c] = Encoder_df.fit_transform(ds[c])
Si crea un backup del dataset finora utilizzato; useremo temporaneamente un dataframe sostitutivo "ds1"
ds1=ds.copy()
ds_backup=ds.copy()
Andiamo quindi a dare uno sguardo generale al dataset
ds1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 122410 entries, 0 to 122409 Columns: 104 entries, time_left to money_difference dtypes: bool(1), float64(101), object(2) memory usage: 96.3+ MB
Verifichiamo quanti elementi mancanti vi sono
ds.isnull().sum().sort_values(ascending=False)
time_left 0 ct_score 0 ct_weapon_usps 0 t_weapon_fiveseven 0 ct_weapon_fiveseven 0 t_weapon_deagle 0 ct_weapon_deagle 0 t_weapon_xm1014 0 ct_weapon_xm1014 0 t_weapon_ump45 0 ct_weapon_ump45 0 t_weapon_ssg08 0 ct_weapon_ssg08 0 t_weapon_sg553 0 ct_weapon_sg553 0 t_weapon_scar20 0 ct_weapon_scar20 0 t_weapon_sawedoff 0 ct_weapon_sawedoff 0 t_weapon_r8revolver 0 ct_weapon_r8revolver 0 t_weapon_p90 0 ct_weapon_p90 0 t_weapon_nova 0 ct_weapon_nova 0 t_weapon_usps 0 ct_weapon_p250 0 t_weapon_p250 0 ct_grenade_molotovgrenade 0 score_difference 0 t_equipment_per_player 0 ct_equipment_per_player 0 equipment_difference 0 t_equipment_cost 0 ct_equipment_cost 0 round_winner 0 t_grenade_decoygrenade 0 ct_grenade_decoygrenade 0 t_grenade_molotovgrenade 0 t_grenade_incendiarygrenade 0 ct_weapon_p2000 0 ct_grenade_incendiarygrenade 0 t_grenade_smokegrenade 0 ct_grenade_smokegrenade 0 t_grenade_flashbang 0 ct_grenade_flashbang 0 t_grenade_hegrenade 0 ct_grenade_hegrenade 0 t_weapon_tec9 0 ct_weapon_tec9 0 t_weapon_p2000 0 t_weapon_negev 0 ct_weapon_negev 0 t_weapon_mp9 0 ct_defuse_kits 0 t_weapon_bizon 0 ct_weapon_bizon 0 t_weapon_awp 0 ct_weapon_awp 0 t_weapon_aug 0 ct_weapon_aug 0 t_weapon_ak47 0 ct_weapon_ak47 0 t_players_alive 0 ct_players_alive 0 t_helmets 0 t_weapon_cz75auto 0 ct_helmets 0 t_money 0 ct_money 0 t_armor 0 ct_armor 0 t_health 0 ct_health 0 bomb_planted 0 map 0 t_score 0 ct_weapon_cz75auto 0 ct_weapon_elite 0 ct_weapon_mp9 0 t_weapon_m4a1s 0 t_weapon_mp7 0 ct_weapon_mp7 0 t_weapon_mp5sd 0 ct_weapon_mp5sd 0 t_weapon_mag7 0 ct_weapon_mag7 0 t_weapon_mac10 0 ct_weapon_mac10 0 t_weapon_m4a4 0 ct_weapon_m4a4 0 ct_weapon_m4a1s 0 t_weapon_elite 0 t_weapon_m249 0 ct_weapon_m249 0 t_weapon_glock 0 ct_weapon_glock 0 t_weapon_galilar 0 ct_weapon_galilar 0 t_weapon_g3sg1 0 ct_weapon_g3sg1 0 t_weapon_famas 0 ct_weapon_famas 0 money_difference 0 dtype: int64
Per comprendere meglio il loro impatto, calcoliamo la percentuale di null per ogni colonna
#ps.set_option("max_columns",None)
ps.DataFrame(ds.isnull().sum()/ds.shape[0]).T
| time_left | ct_score | t_score | map | bomb_planted | ct_health | t_health | ct_armor | t_armor | ct_money | ... | ct_grenade_decoygrenade | t_grenade_decoygrenade | round_winner | ct_equipment_cost | t_equipment_cost | equipment_difference | ct_equipment_per_player | t_equipment_per_player | score_difference | money_difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 rows × 104 columns
Non sembrano esserci situazioni eccessivamente gravi, si procede a sostituire i null con il valore medio.
ds1.fillna(ds1.mean(),inplace=True)
Ai fini del processo di classificazione è necessario convertire gli attributi categorici in numerici
encode_categorical(ds1)
Tratteremo adesso la gestione degli outliers. Per come descritto in precedenza, vi erano principalmente due categorie di problemi da analizzare:
Mostriamo qui la shape del dataframe per capire quante tuple verranno eliminate durante il processo di ripulitura
ds1.shape
(122410, 104)
Filtriamo quindi le tuple il cui numero di armi è anomalo. In questo caso, l'uso di una funzione apposita è difficilmente applicabile in quanto il contesto e sin troppo specifico.
ds1=ds1[(ds1[primary_ct].sum(axis=1)<=ds1['ct_players_alive']) &
(ds1[secondary_ct].sum(axis=1)<=ds1['ct_players_alive']) &
(ds1[primary_t].sum(axis=1)<=ds1['t_players_alive']) &
(ds1[secondary_t].sum(axis=1)<=ds1['t_players_alive'])]
ds1.shape
(122284, 104)
Sono state eliminate all'incirca 200 tuple in totale.
Proseguiamo con la seconda tipologia di anomalie.
ds1=ds1.drop(ds1[(ds1['t_players_alive']==0) & (ds1['round_winner']==1) & (ds1['bomb_planted']==False)].index)
ds1=ds1.drop(ds1[(ds1['ct_players_alive']==0) & (ds1['round_winner']==0)].index)
ds1.shape
(122283, 104)
In questo caso il numero era decisamente inferiore, una sola tupla.
Dopo aver gestito gli outliers, riduciamo il numero di colonne del dataframe eliminando quelle la cui correlazione è troppo elevata. Abbiamo appositamente definito una funzione.
ds1=remove_collinear_features(ds1,0.6)
t_score | ct_score | 0.63 ct_health | time_left | 0.68 ct_health | bomb_planted | 0.62 t_health | time_left | 0.68 t_health | ct_health | 0.76 ct_helmets | ct_armor | 0.69 t_helmets | t_armor | 0.88 ct_defuse_kits | ct_armor | 0.6 ct_defuse_kits | ct_helmets | 0.77 ct_players_alive | time_left | 0.64 ct_players_alive | bomb_planted | 0.62 ct_players_alive | ct_health | 0.97 ct_players_alive | t_health | 0.7 t_players_alive | ct_health | 0.68 t_players_alive | t_health | 0.96 t_players_alive | ct_players_alive | 0.63 t_weapon_ak47 | t_helmets | 0.67 t_weapon_glock | t_health | 0.64 t_weapon_glock | t_players_alive | 0.65 ct_weapon_usps | ct_health | 0.62 ct_weapon_usps | ct_players_alive | 0.62 ct_grenade_flashbang | ct_armor | 0.73 ct_grenade_flashbang | ct_helmets | 0.73 ct_grenade_flashbang | ct_defuse_kits | 0.7 ct_grenade_flashbang | ct_grenade_hegrenade | 0.7 t_grenade_flashbang | t_armor | 0.74 t_grenade_flashbang | t_helmets | 0.77 ct_grenade_smokegrenade | ct_armor | 0.67 ct_grenade_smokegrenade | ct_helmets | 0.64 ct_grenade_smokegrenade | ct_grenade_hegrenade | 0.75 ct_grenade_smokegrenade | ct_grenade_flashbang | 0.81 t_grenade_smokegrenade | t_armor | 0.7 t_grenade_smokegrenade | t_helmets | 0.75 t_grenade_smokegrenade | t_grenade_flashbang | 0.83 ct_grenade_incendiarygrenade | ct_defuse_kits | 0.61 ct_grenade_incendiarygrenade | ct_grenade_hegrenade | 0.76 ct_grenade_incendiarygrenade | ct_grenade_flashbang | 0.75 ct_grenade_incendiarygrenade | ct_grenade_smokegrenade | 0.8 t_grenade_molotovgrenade | t_armor | 0.64 t_grenade_molotovgrenade | t_helmets | 0.7 t_grenade_molotovgrenade | t_grenade_flashbang | 0.82 t_grenade_molotovgrenade | t_grenade_smokegrenade | 0.81 ct_equipment_cost | ct_armor | 0.79 ct_equipment_cost | ct_helmets | 0.78 ct_equipment_cost | ct_defuse_kits | 0.78 ct_equipment_cost | ct_weapon_awp | 0.7 ct_equipment_cost | ct_weapon_m4a4 | 0.71 ct_equipment_cost | ct_grenade_hegrenade | 0.68 ct_equipment_cost | ct_grenade_flashbang | 0.87 ct_equipment_cost | ct_grenade_smokegrenade | 0.8 ct_equipment_cost | ct_grenade_incendiarygrenade | 0.75 t_equipment_cost | t_armor | 0.83 t_equipment_cost | t_helmets | 0.93 t_equipment_cost | t_weapon_ak47 | 0.66 t_equipment_cost | t_grenade_flashbang | 0.82 t_equipment_cost | t_grenade_smokegrenade | 0.8 t_equipment_cost | t_grenade_molotovgrenade | 0.78 equipment_difference | ct_helmets | 0.63 equipment_difference | ct_defuse_kits | 0.64 equipment_difference | ct_equipment_cost | 0.67 ct_equipment_per_player | ct_helmets | 0.67 ct_equipment_per_player | ct_defuse_kits | 0.69 ct_equipment_per_player | ct_weapon_awp | 0.68 ct_equipment_per_player | ct_weapon_m4a4 | 0.62 ct_equipment_per_player | ct_grenade_flashbang | 0.72 ct_equipment_per_player | ct_grenade_smokegrenade | 0.65 ct_equipment_per_player | ct_grenade_incendiarygrenade | 0.61 ct_equipment_per_player | ct_equipment_cost | 0.87 t_equipment_per_player | t_armor | 0.65 t_equipment_per_player | t_helmets | 0.81 t_equipment_per_player | t_grenade_flashbang | 0.65 t_equipment_per_player | t_grenade_smokegrenade | 0.63 t_equipment_per_player | t_grenade_molotovgrenade | 0.63 t_equipment_per_player | t_equipment_cost | 0.87 money_difference | t_money | 0.61
ds1.shape
(122283, 81)
Sono state eliminate una ventina di colonne!
Sulla base dell'analisi sulla feature importance realizzata nella sezione precedente, possiamo ora invece eliminare quelle colonne il cui contributo è risultato sin troppo esiguo. Anche qui, utilizzeremo una funzione apposita.
remove_unimportant_features(valuable_features,ds1)
ds1.shape
(122283, 43)
Abbiamo quindi preparato il dataset per la fase successiva dedicata all'uso dei modelli di classificazione. I punti toccati durante questa sezione sono stati quindi:
Per ciò che invece concerne lo scaling delle features si è preferito gestire questo aspetto nel prosieguo del progetto in modo tale da adattare tale tecnica al particolare classificatore utilizzato.
ds_orig=ds.copy()
ds=ds1.copy()
In modo tale da agevolare la valutazione dei modelli, definiamo un metodo che vada a computare le perfomance dei suddetti, basandosi sulla cross validation. Quest'ultima è una tecnica che suddivide il dataset originale in k parti, k-1 adibiti al training, la restante per effetuare il testing; il processo è ripetuto k-volte, permettendo a ciascuna delle parti di comportarsi da testing set. Lo scopo è avere un resoconto delle performance del modello che sia il più accurato possibile, diminuendo il bias che si avrebbe con un semplice train-test split. La procedura ricoprerà un ruolo chiave nel tuning degli iperparametri, dove per iperparametro si intende un parametro del modello che non può essere stimato dall'apprendimento (come il numero di vicini in k-nearest neighbors).
from sklearn.model_selection import cross_val_score
def compute_performance(estimators, X, y, scoring_funs):
'''
Compute every score function for each estimator
Parameters:
----------
estimators : a list of predictors
X : data used for computed the corss_validation_test
y : target values
scoring_fun : list of performances measures
Returns:
--------
A dictionary containing the score of each estimator
'''
score_dict = {}
for e in estimators:
score_dict[e.__class__.__name__] = {}
for f in scoring_funs:
scores = cross_val_score(e, X ,y, cv=5 ,scoring=f)
score = scores.mean()
score_std = scores.std()
print("std of " + str(score) + " of type " + str(f) + " is " + str(score_std))
score_dict[e.__class__.__name__][f] = score
return score_dict
In tal caso, il k è stato scelto pari a 5 per ottenere un buon compromesso tra valutazione del modello e tempo di esecuzione. Empiricamente, è stato osservato che k = 10 permette di discernere con buoni risultati la bontà di un modello, tuttavia, considerando l'alto numero di record presenti nel dataset, e i modelli (complessi) che verrano addestrati, si è optato per un valore minore.
from sklearn.model_selection import train_test_split
from sklearn.metrics import*
ds_x=ds.drop(target_feature,axis=1)
ds_y=ds[target_feature]
X_train, X_test, y_train, y_test = train_test_split(ds_x,ds_y, random_state=42,stratify=ds_y)
scoring_funs = ['accuracy', 'precision', 'recall']
Le tipologie di score che sono state scelte sono le classiche utilizzate nella classificazione. Tralasciando l'accuratezza, a seconda del contesto in cui ci si trova, è lecito preferire una precision alta rispetto alla recall (e viceversa), tuttavia non è il caso nel contesto trattato in tal sede; possiamo assumere che entrambe abbiano la stessa rilevanza.
Allo scopo di presentare una panoramica generale delle tecniche di classificazione proposte nell'ambito del machine learning, le tipologie di modelli che verrano addestrati saranno il più possibile diversificati, evidenziando così punti di forza e debolezza dei suddetti.
E' un algoritmo di classificazione basato su un ensemble di alberi decisionali (dove ensemble indica la combinazione di più classificatori); più precisamente, la tipologia di ensemble utilzzato è il bagging, il quale consiste nello scegliere un insieme di campione casuali da un set di dati, con rimpiazzo (i.e. gli oggetti del dataset estratti non vengono rimossi da quest'ultimo). Ciascun campione è costituito da un sottinsieme, sia di righe che di colonne del dataset originale, con l'obiettivo di costruire degli alberi che siano il meno possibile correlati tra di loro, in modo tale da aumentare il numero di predizione corrette. In presenza di una nuova istanza da classificare, l'assegnazione di un'etichetta di classe avviene come segue : ciascun albero facente parte del random forest effettua la propria predizione; alla fine verrà presa la predizione con il maggior numero di voti tramite un meccanismo di hard voting.
from sklearn.ensemble import RandomForestClassifier
rnd_class = RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42)
rnd_class.fit(X_train,y_train)
rnd_class_model_scores = compute_performance([rnd_class], X_train, y_train, scoring_funs)
std of 0.868185204015276 of type accuracy is 0.001173804363789565 std of 0.882830341576921 of type precision is 0.0030061596009319205 std of 0.8549677843378479 of type recall is 0.0023849277828769068
Le performance di base sono già ottimali; è probabile che non otterremo di meglio con l'ottimizzazione degli iperparametri. Essendo random forest un insieme di alberi decisionali, gli iperparametri saranno per la maggior parte in comune ai suddetti, con qualche sottile differenza.
rnd_class._get_param_names()
['bootstrap', 'ccp_alpha', 'class_weight', 'criterion', 'max_depth', 'max_features', 'max_leaf_nodes', 'max_samples', 'min_impurity_decrease', 'min_samples_leaf', 'min_samples_split', 'min_weight_fraction_leaf', 'n_estimators', 'n_jobs', 'oob_score', 'random_state', 'verbose', 'warm_start']
Ad esclusione di n_estimators denotante il numero di alberi della foresta, gli iperparametri sopraillustrati verranno trattati in dettaglio in relazione al decision tree. Per quanto concerne gli iperparametri più significativi, sono i medesimi del decision tree, con l'aggiunta di n_estimators.
most_important_params = {'max_depth', 'max_features', 'min_samples_leaf', 'min_samples_split', 'n_estimators'}
Una precisazione riguardo il criterio per la valutazione dell'impurità dei nodi : essendo il random forest molto esigente in termini di risorse,verrà utilizzato il gini index per velocizzare la fase di learning. Precisato quest'aspetto, esaminiamo come si comportano le performance del classificatore al variare di ogni iperparametro, fissando i rimanenti.
def plot_classifier_performance(param_type, param_values, classifier, X_train, y_train, scoring_fun):
cross_val_scores = np.empty(len(param_values), dtype=float)
for i in range(len(param_values)):
classifier.set_params(**dict({param_type : param_values[i]}))
classifier.fit(X_train,y_train)
scores = cross_val_score(classifier, X_train , y_train, cv=4 ,scoring=scoring_fun)
cross_val_scores[i] = scores.mean()
plt.figure(figsize=(15,7))
plt.title(scoring_fun + " of classifier")
plt.plot(param_values, cross_val_scores, color='forestgreen', linewidth = 3, marker='o', markerfacecolor='darkorange', markersize=10)
plt.xlabel(param_type)
plt.ylabel('Scores')
max_depth_values = [int(x) for x in np.linspace(5, 110, num = 30)]
for fun in scoring_funs:
plot_classifier_performance('max_depth', max_depth_values, RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42), X_train, y_train, fun)
Come si envice dai grafici, gli andamenti delle diverse tipologie di score sono pressochè simili, assestandosi per valori di max_depth > 60. Notiamo inoltre che per valori inferiori a 20 del parametro, le differenze sono più accentuate. E' imperativo scegliere un valore che non renda banale il modello, e, al contempo, troppo complesso (si rischia di andare in overfitting). Un valore ottimale sembrerebbe essere max_depth = 37.
min_samples_leaf_values = [int(x) for x in np.linspace(1, 20, num = 15)]
for fun in scoring_funs:
plot_classifier_performance('min_samples_leaf', min_samples_leaf_values, RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42), X_train, y_train, fun)
I tre grafici pongono in esame una caso molto particolare, i.e. gli score decrescono esponenzialmente; sembrerebbe essere un fenomeno di underfitting (i.e. il modello diventa troppo banale), ciò avviene poiché stiamo limitando la crescita in profondita' dell'albero. Il valore ottimo appar essere in un intorno di 1.
min_samples_split_values = [int(x) for x in np.linspace(2, 40, num = 25)]
for fun in scoring_funs:
plot_classifier_performance('min_samples_split', min_samples_split_values, RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42), X_train, y_train, fun)
La situazione è analoga al caso precedente. E' molto probabile che di questo passo non vi sarà bisogno di una fase di tuning. I valori di default degli iperparametri sembrerebbero la soluzione ottimale.
n_estimators_values = [int(x) for x in np.linspace(40, 600, num = 30)]
for fun in scoring_funs:
plot_classifier_performance('n_estimators', n_estimators_values, RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42), X_train, y_train, fun)
La situazione è alquanto interessante. L'accuracy sembra avere un andamento abbastanza stabile. Precision e recall dimostrano un andamento "ballerino", entrambi esibenti svariati picchi. Sebbene non sia stato possibile eseguire simulazioni per valori superiori a 600, causa la limitatezza delle risorse hardware, è possibile inferire dall'andamento degli ultimi due grafici che vi sarà una sorta di convergenza.
Considerando che le differenze prestazionali sono mantenute al minimo al crescere dell'iperparametro (ergo trascurabili), conviene scegliere un intervallo di variazione di n_estimators che permetta di velocizzare i tempi di apprendimento; l'intervallo [100, 200] sembra ragionevole.
random_forest_grid = {'n_estimators': [int(x) for x in np.linspace(start = 110, stop = 200, num = 20)],
'max_features':['sqrt', 'log2', None],
'max_depth': [37],
'bootstrap': [True]}
print(random_forest_grid)
{'n_estimators': [110, 114, 119, 124, 128, 133, 138, 143, 147, 152, 157, 162, 166, 171, 176, 181, 185, 190, 195, 200], 'max_features': ['sqrt', 'log2', None], 'max_depth': [37], 'bootstrap': [True]}
min_samples_split e min_samples_leaf sono stati lasciati ai valori default essendo ottimali. Per quanto concerne la scelta degli algoritmi di ricerca dei valori ottimali degli iperparametri, le strade individuate sono due :
Essendo il numero di combinazioni non elevato ($20 \cdot 3 \cdot 1 \cdot 1 = 60$), GridSearchCV appare essere una scelta razionale.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
rnd_forest_class = RandomForestClassifier(n_estimators=40, n_jobs=-1, random_state=42)
rnd_forest_grid_search = GridSearchCV(rnd_forest_class, random_forest_grid, verbose=3, cv=5)
rnd_forest_grid_search.fit(X_train, y_train)
Fitting 5 folds for each of 60 candidates, totalling 300 fits [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=110;, score=0.875 total time= 4.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=110;, score=0.876 total time= 2.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=110;, score=0.879 total time= 2.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=110;, score=0.876 total time= 2.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=110;, score=0.873 total time= 2.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=114;, score=0.874 total time= 2.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=114;, score=0.876 total time= 2.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=114;, score=0.879 total time= 2.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=114;, score=0.877 total time= 2.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=114;, score=0.874 total time= 2.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=119;, score=0.874 total time= 2.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=119;, score=0.876 total time= 2.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=119;, score=0.879 total time= 2.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=119;, score=0.877 total time= 2.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=119;, score=0.874 total time= 2.3s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=124;, score=0.875 total time= 2.4s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=124;, score=0.877 total time= 2.4s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=124;, score=0.878 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=124;, score=0.877 total time= 2.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=124;, score=0.875 total time= 2.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=128;, score=0.875 total time= 2.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=128;, score=0.877 total time= 2.6s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=128;, score=0.879 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=128;, score=0.877 total time= 2.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=128;, score=0.874 total time= 2.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=133;, score=0.875 total time= 2.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=133;, score=0.876 total time= 2.7s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=133;, score=0.879 total time= 2.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=133;, score=0.877 total time= 2.7s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=133;, score=0.873 total time= 2.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=138;, score=0.875 total time= 2.7s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=138;, score=0.878 total time= 2.7s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=138;, score=0.880 total time= 2.8s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=138;, score=0.876 total time= 2.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=138;, score=0.874 total time= 2.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=143;, score=0.875 total time= 2.7s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=143;, score=0.878 total time= 2.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=143;, score=0.880 total time= 2.8s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=143;, score=0.877 total time= 2.8s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=143;, score=0.875 total time= 2.9s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=147;, score=0.874 total time= 2.8s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=147;, score=0.878 total time= 2.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=147;, score=0.881 total time= 3.0s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=147;, score=0.876 total time= 3.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=147;, score=0.874 total time= 3.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=152;, score=0.875 total time= 3.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=152;, score=0.879 total time= 3.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=152;, score=0.881 total time= 3.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=152;, score=0.877 total time= 3.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=152;, score=0.875 total time= 3.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=157;, score=0.875 total time= 3.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=157;, score=0.879 total time= 3.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=157;, score=0.881 total time= 3.1s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=157;, score=0.878 total time= 3.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=157;, score=0.875 total time= 3.0s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=162;, score=0.875 total time= 3.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=162;, score=0.879 total time= 3.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=162;, score=0.881 total time= 3.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=162;, score=0.878 total time= 3.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=162;, score=0.875 total time= 3.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=166;, score=0.875 total time= 3.0s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=166;, score=0.879 total time= 3.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=166;, score=0.880 total time= 3.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=166;, score=0.879 total time= 3.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=166;, score=0.875 total time= 3.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=171;, score=0.876 total time= 3.0s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=171;, score=0.878 total time= 3.0s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=171;, score=0.881 total time= 3.0s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=171;, score=0.878 total time= 3.1s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=171;, score=0.875 total time= 3.0s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=176;, score=0.877 total time= 3.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=176;, score=0.878 total time= 3.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=176;, score=0.881 total time= 3.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=176;, score=0.878 total time= 3.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=176;, score=0.875 total time= 3.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=181;, score=0.876 total time= 3.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=181;, score=0.878 total time= 3.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=181;, score=0.881 total time= 3.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=181;, score=0.878 total time= 3.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=181;, score=0.876 total time= 3.3s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=185;, score=0.876 total time= 3.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=185;, score=0.877 total time= 3.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=185;, score=0.881 total time= 3.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=185;, score=0.878 total time= 3.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=185;, score=0.875 total time= 3.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=190;, score=0.876 total time= 3.4s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=190;, score=0.877 total time= 3.4s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=190;, score=0.881 total time= 3.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=190;, score=0.877 total time= 3.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=190;, score=0.875 total time= 3.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=195;, score=0.876 total time= 3.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=195;, score=0.878 total time= 3.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=195;, score=0.881 total time= 3.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=195;, score=0.877 total time= 3.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=195;, score=0.875 total time= 3.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=200;, score=0.876 total time= 3.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=200;, score=0.878 total time= 3.6s [CV 3/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=200;, score=0.881 total time= 3.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=200;, score=0.877 total time= 3.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=sqrt, n_estimators=200;, score=0.875 total time= 3.6s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=110;, score=0.873 total time= 2.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=110;, score=0.876 total time= 2.0s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=110;, score=0.881 total time= 2.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=110;, score=0.879 total time= 2.7s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=110;, score=0.875 total time= 2.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=114;, score=0.874 total time= 2.4s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=114;, score=0.875 total time= 2.4s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=114;, score=0.880 total time= 2.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=114;, score=0.879 total time= 2.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=114;, score=0.875 total time= 2.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=119;, score=0.874 total time= 2.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=119;, score=0.875 total time= 2.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=119;, score=0.880 total time= 2.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=119;, score=0.879 total time= 2.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=119;, score=0.875 total time= 2.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=124;, score=0.874 total time= 2.4s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=124;, score=0.875 total time= 2.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=124;, score=0.881 total time= 2.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=124;, score=0.880 total time= 2.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=124;, score=0.875 total time= 2.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=128;, score=0.874 total time= 2.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=128;, score=0.875 total time= 2.4s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=128;, score=0.881 total time= 2.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=128;, score=0.880 total time= 2.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=128;, score=0.875 total time= 2.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=133;, score=0.874 total time= 2.4s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=133;, score=0.876 total time= 2.4s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=133;, score=0.881 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=133;, score=0.881 total time= 2.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=133;, score=0.876 total time= 2.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=138;, score=0.875 total time= 2.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=138;, score=0.877 total time= 2.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=138;, score=0.880 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=138;, score=0.880 total time= 2.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=138;, score=0.875 total time= 2.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=143;, score=0.874 total time= 2.8s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=143;, score=0.877 total time= 2.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=143;, score=0.881 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=143;, score=0.880 total time= 2.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=143;, score=0.875 total time= 2.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=147;, score=0.875 total time= 2.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=147;, score=0.877 total time= 2.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=147;, score=0.881 total time= 2.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=147;, score=0.880 total time= 2.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=147;, score=0.875 total time= 2.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=152;, score=0.876 total time= 2.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=152;, score=0.877 total time= 2.6s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=152;, score=0.882 total time= 2.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=152;, score=0.880 total time= 2.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=152;, score=0.876 total time= 2.6s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=157;, score=0.875 total time= 2.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=157;, score=0.877 total time= 2.7s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=157;, score=0.882 total time= 2.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=157;, score=0.881 total time= 2.7s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=157;, score=0.876 total time= 2.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=162;, score=0.875 total time= 2.7s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=162;, score=0.877 total time= 2.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=162;, score=0.882 total time= 2.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=162;, score=0.880 total time= 2.7s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=162;, score=0.877 total time= 2.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=166;, score=0.875 total time= 2.8s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=166;, score=0.877 total time= 2.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=166;, score=0.883 total time= 2.8s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=166;, score=0.880 total time= 2.8s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=166;, score=0.877 total time= 2.8s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=171;, score=0.876 total time= 2.9s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=171;, score=0.878 total time= 2.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=171;, score=0.882 total time= 3.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=171;, score=0.880 total time= 3.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=171;, score=0.877 total time= 3.0s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=176;, score=0.875 total time= 3.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=176;, score=0.878 total time= 3.0s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=176;, score=0.881 total time= 3.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=176;, score=0.881 total time= 3.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=176;, score=0.877 total time= 2.9s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=181;, score=0.875 total time= 3.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=181;, score=0.878 total time= 3.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=181;, score=0.881 total time= 3.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=181;, score=0.881 total time= 3.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=181;, score=0.877 total time= 3.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=185;, score=0.875 total time= 3.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=185;, score=0.877 total time= 3.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=185;, score=0.881 total time= 3.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=185;, score=0.882 total time= 3.1s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=185;, score=0.877 total time= 3.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=190;, score=0.875 total time= 3.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=190;, score=0.878 total time= 3.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=190;, score=0.881 total time= 3.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=190;, score=0.881 total time= 3.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=190;, score=0.877 total time= 3.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=195;, score=0.875 total time= 3.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=195;, score=0.878 total time= 3.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=195;, score=0.881 total time= 3.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=195;, score=0.882 total time= 3.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=195;, score=0.877 total time= 3.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=200;, score=0.875 total time= 3.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=200;, score=0.878 total time= 3.3s [CV 3/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=200;, score=0.881 total time= 3.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=200;, score=0.881 total time= 3.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=log2, n_estimators=200;, score=0.877 total time= 3.4s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=110;, score=0.854 total time= 8.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=110;, score=0.859 total time= 8.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=110;, score=0.861 total time= 8.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=110;, score=0.860 total time= 8.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=110;, score=0.859 total time= 8.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=114;, score=0.854 total time= 8.5s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=114;, score=0.859 total time= 8.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=114;, score=0.861 total time= 8.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=114;, score=0.860 total time= 8.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=114;, score=0.859 total time= 8.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=119;, score=0.855 total time= 8.9s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=119;, score=0.858 total time= 8.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=119;, score=0.861 total time= 8.8s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=119;, score=0.860 total time= 9.0s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=119;, score=0.860 total time= 8.9s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=124;, score=0.855 total time= 9.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=124;, score=0.859 total time= 9.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=124;, score=0.861 total time= 9.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=124;, score=0.860 total time= 9.3s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=124;, score=0.860 total time= 9.3s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=128;, score=0.855 total time= 9.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=128;, score=0.859 total time= 9.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=128;, score=0.861 total time= 9.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=128;, score=0.860 total time= 9.5s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=128;, score=0.860 total time= 9.6s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=133;, score=0.856 total time= 9.8s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=133;, score=0.859 total time= 9.9s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=133;, score=0.860 total time= 9.9s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=133;, score=0.860 total time= 9.9s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=133;, score=0.860 total time= 9.9s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=138;, score=0.856 total time= 10.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=138;, score=0.859 total time= 12.0s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=138;, score=0.860 total time= 11.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=138;, score=0.860 total time= 11.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=138;, score=0.860 total time= 10.2s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=143;, score=0.855 total time= 10.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=143;, score=0.860 total time= 10.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=143;, score=0.862 total time= 10.2s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=143;, score=0.860 total time= 10.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=143;, score=0.859 total time= 10.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=147;, score=0.855 total time= 10.7s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=147;, score=0.859 total time= 10.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=147;, score=0.862 total time= 10.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=147;, score=0.860 total time= 10.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=147;, score=0.860 total time= 10.5s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=152;, score=0.855 total time= 10.9s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=152;, score=0.860 total time= 10.7s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=152;, score=0.863 total time= 10.6s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=152;, score=0.859 total time= 10.8s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=152;, score=0.860 total time= 10.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=157;, score=0.856 total time= 11.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=157;, score=0.859 total time= 11.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=157;, score=0.862 total time= 11.0s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=157;, score=0.860 total time= 11.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=157;, score=0.860 total time= 11.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=162;, score=0.855 total time= 11.6s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=162;, score=0.860 total time= 11.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=162;, score=0.862 total time= 11.5s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=162;, score=0.860 total time= 11.6s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=162;, score=0.860 total time= 11.8s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=166;, score=0.855 total time= 11.8s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=166;, score=0.860 total time= 11.9s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=166;, score=0.862 total time= 11.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=166;, score=0.860 total time= 11.7s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=166;, score=0.860 total time= 11.9s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=171;, score=0.856 total time= 12.1s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=171;, score=0.859 total time= 12.1s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=171;, score=0.863 total time= 12.1s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=171;, score=0.860 total time= 12.1s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=171;, score=0.860 total time= 12.1s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=176;, score=0.856 total time= 12.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=176;, score=0.859 total time= 12.5s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=176;, score=0.862 total time= 12.4s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=176;, score=0.860 total time= 12.4s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=176;, score=0.860 total time= 14.0s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=181;, score=0.856 total time= 13.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=181;, score=0.859 total time= 12.8s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=181;, score=0.862 total time= 12.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=181;, score=0.860 total time= 12.8s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=181;, score=0.860 total time= 12.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=185;, score=0.856 total time= 13.2s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=185;, score=0.859 total time= 13.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=185;, score=0.862 total time= 13.1s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=185;, score=0.860 total time= 13.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=185;, score=0.860 total time= 13.0s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=190;, score=0.856 total time= 13.3s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=190;, score=0.859 total time= 13.2s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=190;, score=0.862 total time= 13.3s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=190;, score=0.860 total time= 13.9s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=190;, score=0.860 total time= 13.3s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=195;, score=0.856 total time= 13.7s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=195;, score=0.859 total time= 13.7s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=195;, score=0.863 total time= 13.7s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=195;, score=0.860 total time= 13.8s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=195;, score=0.860 total time= 13.7s [CV 1/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=200;, score=0.856 total time= 13.9s [CV 2/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=200;, score=0.859 total time= 14.0s [CV 3/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=200;, score=0.863 total time= 14.9s [CV 4/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=200;, score=0.861 total time= 14.2s [CV 5/5] END bootstrap=True, max_depth=37, max_features=None, n_estimators=200;, score=0.859 total time= 14.9s
GridSearchCV(cv=5,
estimator=RandomForestClassifier(n_estimators=40, n_jobs=-1,
random_state=42),
param_grid={'bootstrap': [True], 'max_depth': [37],
'max_features': ['sqrt', 'log2', None],
'n_estimators': [110, 114, 119, 124, 128, 133, 138,
143, 147, 152, 157, 162, 166, 171,
176, 181, 185, 190, 195, 200]},
verbose=3)
rnd_forest_grid_search.best_params_
rnd_forest_best_params = {'bootstrap': True,
'max_depth': 37,
'max_features': 'log2',
'n_estimators': 185}
rnd_forest_class_optimized = RandomForestClassifier()
rnd_forest_class_optimized.set_params(**rnd_forest_best_params)
rnd_forest_class_optimized.fit(X_train, y_train)
optimized_model_scores = compute_performance([rnd_forest_class_optimized], X_train, y_train, scoring_funs)
optimized_model_scores
std of 0.8790562133376095 of type accuracy is 0.0022438730662561975 std of 0.8897185583233449 of type precision is 0.003440010361054205 std of 0.8722254251253719 of type recall is 0.003255606017052361
{'RandomForestClassifier': {'accuracy': 0.8790562133376095,
'precision': 0.8897185583233449,
'recall': 0.8722254251253719}}
Com'era plausibile aspettarsi, gli incrementi si dimostrani abbastanza marginali; probabilmente non è possibile ottenere di meglio in tale contesto.
K-nearest neighbors è un algoritmo di classificazione supervisionato, il cui funzionamento è basato sulle carattistiche degli oggetti vicini a quello considerato. Data un'istanza da classificare, si individuano i k oggetti più vicini ad essa (mediante il calcolo delle distanze tra gli oggetti e quella presa in esame), seguita dalla classificazione della stessa, attribuendole l'etichetta più comune tra i k vicini. Siccome dovranno essere computate delle distanze, affinchè non vi siano valori delle feature che prevalgano nel calcolo (dati i diversi intervalli di variazione di ciascuna), è fondamentale eseguire la normalizzazione. La funzione di normalizzazione utilizzata è la min-max, di espressione
$$ x_{normalized} = {x - x_{min} \over x_{max} - x_{min} },\; x_{normalized} ∈ [0,1] $$$ x_{min} $ : il minimo valore assunto dalla feature x
$ x_{max} $: il massimo valore assunto dalla feature x
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_X_train = DataFrame(scaler.fit_transform(X_train))
normalized_X_test = DataFrame(scaler.fit_transform(X_test))
knn_class = KNeighborsClassifier()
knn_class.fit(normalized_X_train,y_train)
knn_model_scores = compute_performance([knn_class], normalized_X_train, y_train, scoring_funs)
knn_model_scores
std of 0.8080076694090099 of type accuracy is 0.0032572972818018854 std of 0.8159817880726539 of type precision is 0.003933014713523534 std of 0.8050125269157429 of type recall is 0.003270098784866818
{'KNeighborsClassifier': {'accuracy': 0.8080076694090099,
'precision': 0.8159817880726539,
'recall': 0.8050125269157429}}
Gli score di partenza sembrano già promettenti. Passiamo all'identificazione degli iperparametri
knn_class.get_params()
{'algorithm': 'auto',
'leaf_size': 30,
'metric': 'minkowski',
'metric_params': None,
'n_jobs': None,
'n_neighbors': 5,
'p': 2,
'weights': 'uniform'}
Descrizioni :
knn_most_important_params = {'n_neighbors', 'weights'}
La scelta degli iperparametri è stata guidata dalle seguenti supposizioni :
Osservando gli iperparametri, senza dubbio, il numero di vicini è quello più importante, essendo alla base dell'algoritmo. Per tale motivo, è utile graficare l'andamento delle metriche di classificazione al suo variare, tuttavia, è imperativo scegliere prima il numero di valori che dovrà assumere. Poichè il dataset esaminato in questo elaborato è di dimensioni elevate, e considerando che l'algoritmo impiegherà svariate risorse di calcolo, è ragionevole far assumere all'iperparametro una 20ina di valori differenti, al fine di avere un andamento qualitativo delle performance del modello.
n_neighbors = [int(x) for x in np.linspace(3, 40, num = 20)]
knn_class = KNeighborsClassifier()
for fun in scoring_funs:
plot_classifier_performance('n_neighbors', n_neighbors, knn_class, normalized_X_train, y_train, fun)
A quanto pare un alto numero di vicini non fa che peggiorare le performance. Tutti e 3 i grafici sono accomunati da uno spike abbastanza evidenziato, in corrispondenza di n_neighbors = 3, che, sembrerebbe il valore ottimo (coincide inoltre con il valore di default). Avere n_neighbors così basso è una situazione migliore di quello che ci aspettavamo; abbiamo ridotto al minimo il tempo necessario per il learning, ottenendo al contempo performance ottimali.
knn_grid_params = { 'n_neighbors' : [3],
'weights' : ['uniform','distance'],
'metric' : ['manhattan']}
knn_grid_params
{'n_neighbors': [3],
'weights': ['uniform', 'distance'],
'metric': ['manhattan']}
Come illustrato dai grafici precedenti, sarebbe insensato far variare n_neighbors in un range di valori, avendone già trovato il valore ottimale. La metrica è stata fissata a quella di manhattan come precedentemente discusso; resta quindi far variare la tipologia di funzione peso. Possiamo anche questa volta far uso di GridSearch, poichè il numero di combinazioni è ridotto al minimo.
knn_class_optimized = KNeighborsClassifier(n_jobs=-1)
grid_search_knn = GridSearchCV(knn_class_optimized, knn_grid_params, cv = 5, verbose=3)
grid_search_knn.fit(normalized_X_train,y_train)
grid_search_knn.best_params_
Fitting 5 folds for each of 2 candidates, totalling 10 fits [CV 1/5] END metric=manhattan, n_neighbors=3, weights=uniform;, score=0.856 total time= 1.2min [CV 2/5] END metric=manhattan, n_neighbors=3, weights=uniform;, score=0.852 total time= 1.2min [CV 3/5] END metric=manhattan, n_neighbors=3, weights=uniform;, score=0.855 total time= 1.3min [CV 4/5] END metric=manhattan, n_neighbors=3, weights=uniform;, score=0.860 total time= 1.2min [CV 5/5] END metric=manhattan, n_neighbors=3, weights=uniform;, score=0.853 total time= 1.3min [CV 1/5] END metric=manhattan, n_neighbors=3, weights=distance;, score=0.877 total time= 1.3min [CV 2/5] END metric=manhattan, n_neighbors=3, weights=distance;, score=0.877 total time= 1.2min [CV 3/5] END metric=manhattan, n_neighbors=3, weights=distance;, score=0.878 total time= 1.2min [CV 4/5] END metric=manhattan, n_neighbors=3, weights=distance;, score=0.882 total time= 1.2min [CV 5/5] END metric=manhattan, n_neighbors=3, weights=distance;, score=0.875 total time= 1.3min
{'metric': 'manhattan', 'n_neighbors': 3, 'weights': 'distance'}
knn_best_params = {'metric': 'manhattan', 'n_neighbors': 3, 'weights': 'distance'}
knn_class_optimized = KNeighborsClassifier(n_jobs=-1)
knn_class_optimized.set_params(**knn_best_params)
knn_class_optimized.fit(normalized_X_train, y_train)
knn_optimized_model_scores = compute_performance([knn_class_optimized], normalized_X_train, y_train, scoring_funs)
knn_optimized_model_scores
std of 0.875567009592878 of type accuracy is 0.002071155888143453 std of 0.8822105179010832 of type precision is 0.0043262493771545766 std of 0.8724819060996699 of type recall is 0.0037521929169129667
{'KNeighborsClassifier': {'accuracy': 0.875567009592878,
'precision': 0.8822105179010832,
'recall': 0.8724819060996699}}
L'incremento di performance è di circa un 13% complessivo; sebbene gli score di partenza erano già posizionati in una fascia ottimale.
Se visto per la prima volta, si potrebbe pensare che questo classificatore sia un modello con una logica di fondo del tutto differente rispetto a quelli presentati in letteratura; non è tuttavia il caso. SGD non fa altro che prendere classificatori lineari (come la regressione logistica, support vector machine etc.) ed addestrarli minimizzando la funzione obiettivo tramite la discesa del gradiente stocastico. L'appellativo "stocastico" deriva dal seguente fatto : in contrapposizione al metodo di discesa del gradiente che ottimizza la loss function prendendo in considerazione l'intero dataset, SGD seleziona in modo random un'istanza del dataset (che potrebbe essere anche un outlier) in base al quale ottimizzare la loss function, risultando così in un algoritmo nettamente più veloce rispetto all'originale, ma con una velocità di convergenza inferiore.
Antecedente al calcolo delle performance di base del classificatore, è di fondamentale importanza standardizzare l'intero dataset, essendo l'algoritmo di discesa del gradiente sensibile a feature che variano in intervalli differenti (la non standardizzazione risulterebbe in un learning rate troppo basso per alcune e troppo alto per le rimanenti). La standardizzazione esibisce la seguente espressione analitica
$$ z = {(x - u) \over s}$$$u$ : media dei valori della feature
$s$ : varianza della feature
standard_scaler = StandardScaler()
scaled_X_train = DataFrame(standard_scaler.fit_transform(X_train))
scaled_X_test = DataFrame(standard_scaler.fit_transform(X_test))
from sklearn.calibration import CalibratedClassifierCV
sgd_class = SGDClassifier(n_jobs=-1)
sgd_class_with_proba = CalibratedClassifierCV(sgd_class)
sgd_clf = sgd_class_with_proba.fit(scaled_X_train, y_train)
clf_model_scores = compute_performance([sgd_clf], scaled_X_train, y_train, scoring_funs)
std of 0.7400122882838598 of type accuracy is 0.0039955718967190895 std of 0.7482556335081396 of type precision is 0.005507310656809974 std of 0.7342714884873378 of type recall is 0.00790427150417793
Se comparati agli score precedenti, i sopracitati si collocano nella fascia più bassa vista finora. Sebbene non siano ottimali, sono comunque sufficienti per avere una base su cui lavorare per l'eventuale miglioramento.
sgd_clf.get_params()
{'base_estimator__alpha': 0.0001,
'base_estimator__average': False,
'base_estimator__class_weight': None,
'base_estimator__early_stopping': False,
'base_estimator__epsilon': 0.1,
'base_estimator__eta0': 0.0,
'base_estimator__fit_intercept': True,
'base_estimator__l1_ratio': 0.15,
'base_estimator__learning_rate': 'optimal',
'base_estimator__loss': 'hinge',
'base_estimator__max_iter': 1000,
'base_estimator__n_iter_no_change': 5,
'base_estimator__n_jobs': -1,
'base_estimator__penalty': 'l2',
'base_estimator__power_t': 0.5,
'base_estimator__random_state': None,
'base_estimator__shuffle': True,
'base_estimator__tol': 0.001,
'base_estimator__validation_fraction': 0.1,
'base_estimator__verbose': 0,
'base_estimator__warm_start': False,
'base_estimator': SGDClassifier(n_jobs=-1),
'cv': None,
'ensemble': True,
'method': 'sigmoid',
'n_jobs': None}
La vasta quantità di iperparametri aggiunge un non indifferente grado di difficoltà alla fase di tuning; questo perchè bisogna discernere quali siano significativi ai fini dell'aumento delle performance del modello. In seguito ad un'attenta riflessione, sono estratti i seguenti :
loss : è la funzione obiettivo che verrà ottimizzata. Diverse funzioni corrispondo a classificatori lineari differenti.
penalty : il termine di regolarizzazione da utilizzare.
alpha : Costante che moltiplica il termine di regolarizzazione. Più alto è il valore, più forte sarà la regolarizzazione.
fit_intercept : stabilisce se l'intercetta debba essere stimata o meno.
max_iter : determina il massimo numero di epoche (cioè quante volte dovrà iterare sull'intero training set).
learning_rate : stabilisce la dimensione del passo nell'algoritmo.
early_stopping : Stabilisce se terminare prima l'algoritmo nel caso in cui lo score di validazione non aumenti.
Il parametro loss verrà lasciato al valore di default, corrispondente alla funzione obiettivo utilizzata nelle SVM, così come la penalità (che è il valore standard per modelli basati su SVM).
max_iter_values = [int(x) for x in np.linspace(1000, 10000, num = 100)]
sgd_class = SGDClassifier(n_jobs=-1)
for fun in scoring_funs:
plot_classifier_performance('max_iter', max_iter_values, sgd_class, scaled_X_train, y_train, fun)
L'andamento oscillatorio e' caratteristico della discesa del gradiente stocastico, infatti, poiche' l'istanza viene scelta randomicamente ad ogni iterazione, l'errore di classificazione prodotto potrebbe essere piu' alto/basso a seconda della scelta l'andamento altalenante illustrato nei grafici e' pertanto giustificato.
sgd_grid_params = { 'fit_intercept' : [True,False],
'tol': [1e-3, 1e-2, 1e-1],
'max_iter' : [int(x) for x in np.linspace(1000, 10000, num = 50)],
'learning_rate': ['optimal'],
'early_stopping': [True,False]}
sgd_class_optimized = SGDClassifier(n_jobs=-1)
grid_search_sgd = GridSearchCV(sgd_class_optimized, sgd_grid_params, cv = 5, verbose=0)
grid_search_sgd.fit(scaled_X_train,y_train)
grid_search_sgd.best_params_
{'early_stopping': False,
'fit_intercept': False,
'learning_rate': 'optimal',
'max_iter': 5408,
'tol': 0.001}
sgd_best_params = {'early_stopping': False,
'fit_intercept': True,
'learning_rate': 'optimal',
'max_iter': 6326,
'tol': 0.001}
sgd_class_optimized = SGDClassifier()
sgd_class_optimized.set_params(**sgd_best_params)
sgd_class_optimized.fit(scaled_X_train, y_train)
sgd_optimized_model_scores = compute_performance([sgd_class_optimized], scaled_X_train, y_train, scoring_funs)
sgd_optimized_model_scores
std of 0.7301117120551412 of type accuracy is 0.003578064807562518 std of 0.7479043176530001 of type precision is 0.012370364983610422 std of 0.7220620259991692 of type recall is 0.033486218327244896
{'SGDClassifier': {'accuracy': 0.7301117120551412,
'precision': 0.7479043176530001,
'recall': 0.7220620259991692}}
Le performance sono sensibilmente peggiorate se confrontate con quelle del modello di partenza. E' incerto se un'analisi su sottoinsieme più ampio degli iperparametri (rispetto a quelli scelti antecedentemente) possa portare dei miglioramenti.
E' una tipologia di classificatore probabilistico che fa uso del teorema di Bayes, assumendo una forte indipendenza tra le features (per tale motivo assume l'appellativo di "naive"). Il teorema di Bayes afferma che la probabilità condizionale P(Y | X), cioè la probabilità che avvenga Y a patto che X si sia già verificato è data da :
$$P(Y | X) = {P(X | Y) P(Y) \over P(X) } $$
dove
Sia P(Y) che P(X) possono essere stimate a partire dal dataset. Per quanto riguarda P(X | Y), in caso di forte indipendenza tra le variabili, il calcolo si traduce nel nella seguente produttoria
$$ P(X_1,..,X_d | Y) = P(X_1 | Y) P(X_2 | Y) ... P(X_d | Y) $$La nuova istanza sarà classificata come Y se la probabilità sopraillustrata è massima. Il problema tuttavia è capire come stimare tale valore. Se i dati fossero stati categorici avremmo potuto stimare la probabilità condizionata per ogni valore, ma dato che alcuni dei valori sono continui, bisogna procedere per altre vie, che sono le seguenti :
Il primo approccio è stato realizzato supponendo che gli attributi seguano una distribuzione normale (consci di quanto appreso in fase di esplorazione sulle distribuzioni degli attributi, tale ipotesi non è molto forte)
from sklearn.naive_bayes import GaussianNB,MultinomialNB
gnb_class = GaussianNB()
gnb_class.fit(X_train,y_train)
gnb_class_model_scores = compute_performance([gnb_class], X_train, y_train, scoring_funs)
std of 0.7139306093893942 of type accuracy is 0.004518283952704165 std of 0.722655239832979 of type precision is 0.0054532880487088535 std of 0.7123948053245132 of type recall is 0.005048686512961746
I risultati sono decisamente inferiori rispetto a quanto ottenuto con gli altri classificatori. Proviamo a utilizzare la prima soluzione, la quale richiedeva la discretizzazione gli attributi.
Un problema da valutare è che non siamo a conoscenza del numero di bin in cui suddividere gli intervalli.
Come prima cosa sono stati identificati gli attributi da discretizzare ("to_discretize"), dopodiché si è andati alla ricerca del numero di bin ottimali, effettuando un'indagine random sullo spazio di ricerca (una ricerca approfondita avrebbe richiesto troppe risorse computazionali e temporali). I risultati della ricerca casuale sono stati poi modificati singolarmente sulla base dell'intorno nel quale erano definiti.
La funzione utilizzata per questo procedimento è la "cut" di pandas. Sono state prese in considerazione altre soluzioni, come KBinsDiscretizer o qcut, tuttavia, cut ha mostrato, in questo contesto, le performance migliori.
to_discretize=['time_left', 'ct_score', 'ct_armor', 't_armor',
'ct_money', 't_money','score_difference']
bins={'ct_armor': {'bins': 154},
'ct_money': {'bins': 83},
'ct_score': {'bins': 27},
'score_difference': {'bins': 165},
't_armor': {'bins': 154},
't_money': {'bins': 112},
'time_left': {'bins': 55}}
discretized_X_train = X_train.copy()
discretized_X_test = X_test.copy()
for c in to_discretize:
n_bins=bins[c]['bins']
discretized_X_train[c]=ps.cut(discretized_X_train[c],n_bins)
discretized_X_test[c]=ps.cut(discretized_X_test[c],n_bins)
encode_categorical(discretized_X_train)
encode_categorical(discretized_X_test)
mnb_class = MultinomialNB()
mnb_class.fit(discretized_X_train,y_train)
mnb_class_model_scores = compute_performance([mnb_class], discretized_X_train, y_train, scoring_funs)
std of 0.7403938969272057 of type accuracy is 0.00341010544850502 std of 0.7426102007589231 of type precision is 0.0045086085004749496 std of 0.7512724150757611 of type recall is 0.0021329984114494675
Notiamo un deciso miglioramento rispetto al primo approccio, sebbene le prestazioni, in generale, rimangano al di sotto degli altri classificatori, a eccezione di SGD.
Poichè il classificatore non ha iperparametri da ottimizzare, le performance rimangono le sopraillustrate.
L'albero decisionale è una tecnica di classificazione supervisionata. L'obiettivo di tale tecnica è creare un modello di addestramento che possa essere utilizzato per prevedere la classe o il valore della variabile target apprendendo semplici regole decisionali dedotte dal training set. L'albero è così strutturato :
Nella comunità del machine learning, nonostante siano state proposte diverse tipologie di algoritmi di learning per la costruzione di un albero decisionale, in generale l'assegnazione di un'etichetta di classe per una nuova istanza inizia dalla radice dell'albero, confrontando il valore di un attributo dell'istanza da classificare con quello specificato nella radice. A seconda dell'esito del confronto, si seguirà un particolare ramo dell'albero, ripetendo iterativemente il processo fino ad arrivare al nodo foglia, assegnando l'etichetta di classe all'istanza. Tralasciando i diversi algoritmi di learning, questi sono accomunati dalla scelta di 3 fattori che influenzerrano le performance del classificatore :
Condizione di split : dipende dalla tipologia di attributo (binario,nominale,ordinale,cotninuo) e dal numero di vie (cioè il numero di rami che si diparatono dal nodo che è possibile seguire).
Criterio che definisce lo split migliore : dipende dalla purezza di un nodo; è quindi necessario definire delle misure per valutare le impurità dei nodi. Alcune sono il gini index e l'entropia.
Il criterio per interrompere lo splitting : scegliere se interrompere lo splitting fino ad arrivare a dei nodi puri (tuttavia il modello potrebbe diventare troppo complesso, e sarebbe soggetto ad overfitting, oppure fermare il processo secondo uno specifico criterio (ad esempio quando il numero di record è inferiore ad una certa soglia)).
Terminata questa breve panoramica sull'algoritmo, valutiamo le performance base del classificatore.
from sklearn.tree import DecisionTreeClassifier
decision_tree_class = DecisionTreeClassifier(random_state=42)
decision_tree_class.fit(X_train,y_train)
compute_performance([decision_tree_class], X_train, y_train, scoring_funs)
std of 0.8049545981503565 of type accuracy is 0.002315952213951069 std of 0.8090685714017329 of type precision is 0.00262838088274609 std of 0.8081988396014254 of type recall is 0.0026467512212156085
{'DecisionTreeClassifier': {'accuracy': 0.8049545981503565,
'precision': 0.8090685714017329,
'recall': 0.8081988396014254}}
Dal momento che le misure vengono calcolate per ciascun fold, si ricorda che quelle sopraillustrate rappresentano la media relativa a tutti i fold. Detto ciò, possiamo già notare delle buone performance con i valori di base degli iperparametri. Poichè La varianza delle misure è abbastanza bassa, possiamo suppore come gli score, tra un fold e l'altro, siano abbastanza vicini tra di loro. Procediamo ora all'individuazione degli iperparametri da ottimizzare
decision_tree_class._get_param_names()
Le descrizioni sono di seguito elencate :
class_weight : Pesi associati alle classi.
criterion : criterio utilizzato per valutare l'impurità di un nodo.
max_depth: Indica la massima profondità che gli alberi possano avere. E' un parametro di fondamentale importanza in quanto più profondi sono gli alberi e più complesso diventerà il modello, causando overfitting, rendendo il modello incapace di generalizzare. All'estremmo opposto, se l'altezza degli alberi è troppo bassa, si rischia di andare in underfitting, rendendo il modello troppo banale.
max_features : Numero di attributi da considerare per individuare lo split migliore.
max_leaf_nodes : Numero massimo di nodi foglia che i sotto-alberi possano avere.
min_impurity_decrease : Se la diminuzione dell'impurità del nodo è maggiore o uguale a questo valore, allora verrà effettuato lo splitting sul nodo.
min_samples_leaf : Il numero minimo di campioni affinchè un nodo sia foglia. Lo splitting di un nodo ad una qualsiasi profondità verrà considerato solo se i corrispettivi figli (destro e sinistro) avranno un numero di campioni pari a tale valore. Similmente a max_depth, il parametro è utilizzato per controllare l'overfitting degli alberi.
min_samples_split : Numero minimo di campioni nel nodo per effettuare lo splitting. Anch'esso influisce sulla crescita in profondità dell'albero.
min_weight_fraction_leaf : La minima frazione pesata della somma totale dei pesi (di tutti i campioni di input) richiesta per trovarsi su un nodo foglia.
splitter : La strategia utilizzata per la scelta dello spitting in ogni nodo.
Si è osservato empiricamente che gli iperparametri che influiscono maggiormente sulle performance sono i sotto elencati
dec_tre_most_important_params = {'max_depth', 'max_features', 'min_samples_leaf', 'min_samples_split'}
Quelli che sono stati esclusi verranno fissati a valori predefiniti. Come passo successivo rappresentiamo l'andamento delle performance del classificatore al variare di ciascuno degli iparametri sopra illustrati, fissando i rimanenti (lasciandoli a default).
max_depth_values = [int(x) for x in np.linspace(4, 50, num = 25)]
for fun in scoring_funs:
plot_classifier_performance('max_depth', max_depth_values, DecisionTreeClassifier(random_state=42), X_train, y_train, fun)
Man mano che la profondità dell'albero aumenta possiamo notare un notevole aumento di prestazioni del modello; potremmo essere in presenza di un fenomeno di overfitting; bisogna scegliere cautamente il valore di max_depth.
min_samples_split_values = [int(x) for x in np.linspace(30, 50, num = 15)]
for fun in scoring_funs:
plot_classifier_performance('min_samples_split', min_samples_split_values, DecisionTreeClassifier(random_state=42), X_train, y_train, fun)
min_samples_leaf_values = [int(x) for x in np.linspace(3, 30, num = 25)]
for fun in scoring_funs:
plot_classifier_performance('min_samples_leaf', min_samples_leaf_values, DecisionTreeClassifier(random_state=42), X_train, y_train, fun)
Quest'ultima tripa di grafici consolida che le prestazioni non siano ottimali se comparati alla configurazione di partenza, ciò nonostante bisogna ricordare che il classificatore con i valori di base non è stato limitato nella crescita in profondità, lasciando spazio alla possibilità che si sia verificato overfitting. A giudicare dai grafici, probabilmente non otterremo risultati ottimali se partissimo con la ricerca degli iperparametri tramite gridsearch o randomsearch. Proviamo a far crescere il classificatore in profondità senza limiti, effettuando in seguito del post pruning, selezionando di conseguenza un valore ideale dell'iperparametro ccp.
dec_tree_class = DecisionTreeClassifier(random_state=42)
dec_tree_class.fit(X_train, y_train)
path = dec_tree_class.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
ccp_alphas = np.random.choice(ccp_alphas, size=250, replace=False)
ccp_alphas = np.sort(ccp_alphas)
classifiers = []
for i in range(len(ccp_alphas)):
classifier = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alphas[i])
classifier.fit(X_train, y_train)
classifiers.append(classifier)
def find_best_model_based_on_ccp(ccp_alphas, scoring_fun, classifiers):
train_scores = [0.0 for a in range(len(ccp_alphas))]
best_train_score = 0
best_class_index = 0
for i in range(len(train_scores)):
train_scores[i] = cross_val_score(classifiers[i], X_train, y_train, cv=4, scoring=scoring_fun).mean()
if train_scores[i] > best_train_score:
best_class_index = i
best_train_score = train_scores[i]
return classifiers[best_class_index], train_scores
def plot_ccp_vs_score(ccp_alphas, train_scores, score_fun):
plt.figure(figsize=(15,7))
plt.title(score_fun + " score vs alpha")
plt.plot(ccp_alphas, train_scores, label="cross_val", color='darkcyan', linewidth = 3, marker='o', markerfacecolor='darkorange', markersize=7)
plt.xlabel('alpha')
plt.ylabel(score_fun)
classifier, train_scores = find_best_model_based_on_ccp(ccp_alphas, 'roc_auc', classifiers)
plot_ccp_vs_score(ccp_alphas, train_scores, 'roc_auc')
compute_performance([classifier], X_train, y_train, scoring_funs=scoring_funs + ['roc_auc'])
mean 0.7823077115083217 of type accuracy is 0.0019970680789415255 mean 0.7891001037862184 of type precision is 0.005094683391495693 mean 0.7822165859053379 of type recall is 0.009373057210133305 mean 0.8653219696281225 of type roc_auc is 0.0017054522808940065
{'DecisionTreeClassifier': {'accuracy': 0.7823077115083217,
'precision': 0.7891001037862184,
'recall': 0.7822165859053379,
'roc_auc': 0.8653219696281225}}
Se da un lato abbiamo guadagnato nella maggior quantità di area coperta dalla curva roc, dall'altro notiamo dei cali di performance per le restanti 3 misure. Non avendo avuto fortuna con il roc_auc_score, proviamo a ripetere il ragionamento con gli score definiti in scoring_funs.
best_classifiers = []
for i in range(len(scoring_funs)):
classifer, train_scores = find_best_model_based_on_ccp(ccp_alphas, scoring_funs[i], classifiers)
best_classifiers.append(classifier)
plot_ccp_vs_score(ccp_alphas, train_scores, scoring_funs[i])
La strategia di post pruning si è rilevata insufficiente. Come ultimo tentativo, anche se e' chiaro che non si potrà ottenere di meglio delle performance del classificatore con i valori di base, adoperiamo random search al fine di individuare una combinazione ottimale degli iperparametri.
decision_tree_grid_params = { 'min_samples_split': [int(x) for x in np.linspace(start = 20, stop = 40, num = 10)] ,
'min_samples_leaf': [int(x) for x in np.linspace(start = 2, stop = 20, num = 10)],
'max_depth': [int(x) for x in np.linspace(start = 20, stop = 40, num = 10)],
'max_features': ['sqrt','log2', None]
}
Com'è possibile osservare il numero di modelli candidati ad essere valutati e' abbastanza elevato ($10 \cdot 10 \cdot 10 \cdot 3 = 3000$), considerando inoltre che ciascun modello verrà sottoposto a k-fold cross validation, il tutto va moltiplicato per k (nel nostro caso k =5) , ergo il training verra' eseguito 15.000 volte. Di fronte a questa cifra, utilizzare grid search sarebbe una follia, per questo si e' preferito random search.
from sklearn.model_selection import RandomizedSearchCV
decision_tree_class_1 = DecisionTreeClassifier(random_state=42)
grid_search_decision_tree = RandomizedSearchCV(estimator = decision_tree_class_1, param_distributions = decision_tree_grid_params, n_iter = 250, cv = 5, verbose=2, random_state=42, n_jobs = -1)
grid_search_decision_tree.fit(X_train,y_train)
grid_search_decision_tree.best_params_
Fitting 5 folds for each of 250 candidates, totalling 1250 fits [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=22; total time= 2.2s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=22; total time= 2.2s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=22; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=22; total time= 2.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=22; total time= 2.3s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=20; total time= 2.4s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=20; total time= 2.7s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=20; total time= 2.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=35; total time= 1.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=35; total time= 1.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=35; total time= 1.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=35; total time= 0.9s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=20; total time= 1.9s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=20; total time= 2.1s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=35; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.7s [CV] END max_depth=37, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=22; total time= 1.9s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=22; total time= 2.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=22; total time= 2.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=22; total time= 2.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=22; total time= 1.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 1.2s [CV] END max_depth=26, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.0s [CV] END max_depth=26, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.1s [CV] END max_depth=26, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.0s [CV] END max_depth=26, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 1.9s [CV] END max_depth=26, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.1s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 1.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 2.0s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 1.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=26; total time= 1.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=26; total time= 0.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=26; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=26; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=26; total time= 1.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=2, min_samples_split=37; total time= 0.9s [CV] END max_depth=22, max_features=log2, min_samples_leaf=2, min_samples_split=37; total time= 0.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=2, min_samples_split=37; total time= 0.8s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 2.2s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=2, min_samples_split=37; total time= 0.9s [CV] END max_depth=22, max_features=log2, min_samples_leaf=2, min_samples_split=37; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=40; total time= 1.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=40; total time= 1.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=40; total time= 1.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=12, min_samples_split=28; total time= 1.3s [CV] END max_depth=35, max_features=log2, min_samples_leaf=12, min_samples_split=28; total time= 1.3s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=40; total time= 1.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=12, min_samples_split=28; total time= 1.7s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=40; total time= 1.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=12, min_samples_split=28; total time= 1.4s [CV] END max_depth=35, max_features=log2, min_samples_leaf=12, min_samples_split=28; total time= 1.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 1.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.1s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.7s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.2s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.8s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 2.2s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.3s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 0.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 1.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 1.3s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 1.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 1.2s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 0.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 0.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.8s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.8s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.0s [CV] END max_depth=26, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.5s [CV] END max_depth=26, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 2.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.3s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=8, min_samples_split=37; total time= 1.5s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=8, min_samples_split=37; total time= 1.2s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=8, min_samples_split=37; total time= 1.2s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=8, min_samples_split=37; total time= 1.5s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=8, min_samples_split=37; total time= 1.5s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.4s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.2s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.6s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.2s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.5s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.7s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.7s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.6s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 0.9s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 0.9s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.5s [CV] END max_depth=20, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.5s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.0s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.2s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=20; total time= 1.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=20; total time= 1.1s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=20; total time= 0.7s [CV] END max_depth=33, max_features=None, min_samples_leaf=18, min_samples_split=26; total time= 2.5s [CV] END max_depth=33, max_features=None, min_samples_leaf=18, min_samples_split=26; total time= 2.1s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=20; total time= 0.6s [CV] END max_depth=33, max_features=None, min_samples_leaf=18, min_samples_split=26; total time= 2.3s [CV] END max_depth=40, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=20; total time= 1.0s [CV] END max_depth=33, max_features=None, min_samples_leaf=18, min_samples_split=26; total time= 2.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=18, min_samples_split=26; total time= 2.6s [CV] END max_depth=40, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 1.3s [CV] END max_depth=40, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 1.4s [CV] END max_depth=40, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 1.8s [CV] END max_depth=40, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 1.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=33; total time= 0.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=33; total time= 0.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=33; total time= 0.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.4s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=33; total time= 0.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.6s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=33; total time= 0.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.2s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=33; total time= 1.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=10, min_samples_split=22; total time= 1.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=10, min_samples_split=22; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=10, min_samples_split=22; total time= 1.1s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=10, min_samples_split=22; total time= 0.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=10, min_samples_split=22; total time= 0.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=33; total time= 2.4s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=33; total time= 2.3s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=33; total time= 2.5s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=33; total time= 2.3s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 0.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 1.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=31; total time= 1.9s [CV] END max_depth=31, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.6s [CV] END max_depth=31, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 1.8s [CV] END max_depth=31, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 1.8s [CV] END max_depth=40, max_features=log2, min_samples_leaf=12, min_samples_split=22; total time= 1.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 2.1s [CV] END max_depth=40, max_features=log2, min_samples_leaf=12, min_samples_split=22; total time= 1.9s [CV] END max_depth=40, max_features=log2, min_samples_leaf=12, min_samples_split=22; total time= 1.9s [CV] END max_depth=40, max_features=log2, min_samples_leaf=12, min_samples_split=22; total time= 2.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=33; total time= 2.1s [CV] END max_depth=40, max_features=log2, min_samples_leaf=12, min_samples_split=22; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 1.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.5s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.6s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 0.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.4s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 0.8s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 2.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 2.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.8s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=20, min_samples_split=37; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=20, min_samples_split=37; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=20, min_samples_split=37; total time= 1.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=20, min_samples_split=37; total time= 0.9s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=20, min_samples_split=37; total time= 1.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 1.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 1.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.5s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.5s [CV] END max_depth=20, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.6s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 0.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 2.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 1.9s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 1.5s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 2.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 2.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 1.6s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 2.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 1.3s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=26; total time= 2.0s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=26; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.3s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=26; total time= 1.0s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=26; total time= 1.9s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=26; total time= 1.9s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=10, min_samples_split=35; total time= 1.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=10, min_samples_split=35; total time= 1.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=10, min_samples_split=35; total time= 2.0s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=10, min_samples_split=35; total time= 2.1s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=10, min_samples_split=35; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=6, min_samples_split=31; total time= 1.3s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=6, min_samples_split=31; total time= 2.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=6, min_samples_split=31; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=6, min_samples_split=31; total time= 2.4s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=6, min_samples_split=31; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.4s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=35; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=35; total time= 1.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=35; total time= 1.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=35; total time= 2.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=35; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=20; total time= 1.8s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=20; total time= 2.0s [CV] END max_depth=26, max_features=log2, min_samples_leaf=20, min_samples_split=20; total time= 1.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=20, min_samples_split=20; total time= 2.1s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=20; total time= 2.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=20, min_samples_split=20; total time= 1.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=20, min_samples_split=20; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=20; total time= 2.2s [CV] END max_depth=26, max_features=log2, min_samples_leaf=20, min_samples_split=20; total time= 2.2s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=20; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 1.6s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 1.7s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 1.7s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 1.7s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 1.8s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=37; total time= 1.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.5s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=37; total time= 1.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.6s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=37; total time= 1.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=2, min_samples_split=26; total time= 2.2s [CV] END max_depth=20, max_features=log2, min_samples_leaf=12, min_samples_split=24; total time= 1.2s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=37; total time= 1.3s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=37; total time= 1.3s [CV] END max_depth=20, max_features=log2, min_samples_leaf=12, min_samples_split=24; total time= 1.3s [CV] END max_depth=20, max_features=log2, min_samples_leaf=12, min_samples_split=24; total time= 1.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=12, min_samples_split=24; total time= 1.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=12, min_samples_split=24; total time= 1.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=12, min_samples_split=40; total time= 1.3s [CV] END max_depth=28, max_features=log2, min_samples_leaf=12, min_samples_split=40; total time= 1.3s [CV] END max_depth=28, max_features=log2, min_samples_leaf=12, min_samples_split=40; total time= 0.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 1.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=12, min_samples_split=40; total time= 0.6s [CV] END max_depth=28, max_features=log2, min_samples_leaf=12, min_samples_split=40; total time= 0.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 0.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 0.8s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 0.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 0.7s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 3.0s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 3.0s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=24, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 0.8s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=22; total time= 1.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=22; total time= 1.3s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=37; total time= 1.9s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=22; total time= 1.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=37; total time= 1.9s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=22; total time= 0.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=37; total time= 2.3s [CV] END max_depth=26, max_features=log2, min_samples_leaf=2, min_samples_split=22; total time= 1.0s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=37; total time= 2.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=4, min_samples_split=37; total time= 0.9s [CV] END max_depth=24, max_features=log2, min_samples_leaf=4, min_samples_split=37; total time= 1.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=4, min_samples_split=37; total time= 1.0s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=37; total time= 2.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=4, min_samples_split=37; total time= 1.2s [CV] END max_depth=24, max_features=log2, min_samples_leaf=4, min_samples_split=37; total time= 1.7s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.8s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.6s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.6s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.6s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=4, min_samples_split=24; total time= 1.4s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.3s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 1.8s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.3s [CV] END max_depth=20, max_features=None, min_samples_leaf=14, min_samples_split=33; total time= 2.2s [CV] END max_depth=20, max_features=None, min_samples_leaf=14, min_samples_split=33; total time= 2.2s [CV] END max_depth=20, max_features=None, min_samples_leaf=14, min_samples_split=33; total time= 2.2s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 0.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=14, min_samples_split=33; total time= 1.6s [CV] END max_depth=20, max_features=None, min_samples_leaf=14, min_samples_split=33; total time= 2.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 1.2s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 1.6s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 1.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 1.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=20; total time= 0.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=20; total time= 0.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=20; total time= 0.7s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=31; total time= 2.2s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=31; total time= 2.2s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=31; total time= 2.2s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=31; total time= 2.0s [CV] END max_depth=26, max_features=None, min_samples_leaf=20, min_samples_split=31; total time= 3.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=20; total time= 0.9s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=20; total time= 1.0s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 0.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 0.9s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 0.8s [CV] END max_depth=28, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 1.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 1.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 1.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.0s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 0.9s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 0.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.1s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.7s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.8s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.7s [CV] END max_depth=22, max_features=None, min_samples_leaf=6, min_samples_split=22; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=6, min_samples_split=22; total time= 2.1s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.7s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 0.6s [CV] END max_depth=22, max_features=None, min_samples_leaf=6, min_samples_split=22; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=6, min_samples_split=22; total time= 2.2s [CV] END max_depth=22, max_features=None, min_samples_leaf=6, min_samples_split=22; total time= 2.2s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=37; total time= 0.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 1.8s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=37; total time= 0.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=37; total time= 0.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 1.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 1.9s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=37; total time= 0.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.0s [CV] END max_depth=37, max_features=None, min_samples_leaf=14, min_samples_split=26; total time= 2.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 0.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 0.9s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=37; total time= 1.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=2, min_samples_split=24; total time= 1.7s [CV] END max_depth=33, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=2, min_samples_split=24; total time= 1.5s [CV] END max_depth=33, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=2, min_samples_split=24; total time= 1.6s [CV] END max_depth=20, max_features=log2, min_samples_leaf=2, min_samples_split=24; total time= 1.5s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=20, min_samples_split=31; total time= 1.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=2, min_samples_split=24; total time= 1.6s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=20, min_samples_split=31; total time= 1.9s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=20, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=20, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=20, min_samples_split=31; total time= 2.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=6, min_samples_split=28; total time= 1.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=6, min_samples_split=28; total time= 1.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=6, min_samples_split=28; total time= 1.1s [CV] END max_depth=35, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.8s [CV] END max_depth=35, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.0s [CV] END max_depth=35, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.0s [CV] END max_depth=35, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.2s [CV] END max_depth=35, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.2s [CV] END max_depth=35, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.5s [CV] END max_depth=35, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.8s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=6, min_samples_split=28; total time= 1.9s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=6, min_samples_split=28; total time= 1.9s [CV] END max_depth=35, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=16, min_samples_split=24; total time= 1.5s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 1.0s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 1.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 2.3s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 2.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=24; total time= 2.2s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=18, min_samples_split=24; total time= 2.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=24; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=24; total time= 1.4s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=24; total time= 2.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=24; total time= 2.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=20; total time= 1.5s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=20; total time= 1.6s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 1.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 1.5s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=20; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=20; total time= 1.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=20; total time= 1.9s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 2.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=28; total time= 2.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=6, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=6, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=6, min_samples_split=20; total time= 2.1s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=6, min_samples_split=20; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=6, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.7s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.3s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.2s [CV] END max_depth=33, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 0.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 0.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 0.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 3.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 0.8s [CV] END max_depth=28, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 1.4s [CV] END max_depth=28, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 2.0s [CV] END max_depth=40, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.2s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 2.3s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 1.7s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 2.0s [CV] END max_depth=40, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.0s [CV] END max_depth=40, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 0.9s [CV] END max_depth=40, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.9s [CV] END max_depth=40, max_features=log2, min_samples_leaf=18, min_samples_split=22; total time= 1.9s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=22; total time= 1.9s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=22; total time= 1.7s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=6, min_samples_split=22; total time= 1.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 1.5s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 1.5s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 2.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.7s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.9s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 2.2s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=40; total time= 1.1s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=40; total time= 2.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=40; total time= 2.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=40; total time= 2.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=40; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=20; total time= 2.9s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=20; total time= 2.4s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=20; total time= 3.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=20; total time= 3.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=20; total time= 2.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.9s [CV] END max_depth=35, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.7s [CV] END max_depth=35, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=8, min_samples_split=22; total time= 2.3s [CV] END max_depth=35, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.9s [CV] END max_depth=37, max_features=None, min_samples_leaf=8, min_samples_split=22; total time= 2.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=8, min_samples_split=22; total time= 2.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=8, min_samples_split=22; total time= 2.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=8, min_samples_split=22; total time= 2.7s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 0.9s [CV] END max_depth=35, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 1.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 1.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 1.2s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 1.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=31; total time= 1.3s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=31; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=31; total time= 0.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=31; total time= 0.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=6, min_samples_split=31; total time= 2.8s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=31; total time= 0.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=31; total time= 0.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=6, min_samples_split=31; total time= 3.2s [CV] END max_depth=31, max_features=None, min_samples_leaf=6, min_samples_split=31; total time= 3.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=6, min_samples_split=31; total time= 2.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=6, min_samples_split=31; total time= 3.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=31; total time= 0.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=31; total time= 1.2s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=31; total time= 2.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=10, min_samples_split=31; total time= 2.1s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=18, min_samples_split=31; total time= 1.8s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=18, min_samples_split=31; total time= 1.4s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=18, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=18, min_samples_split=31; total time= 2.0s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=18, min_samples_split=31; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=35; total time= 1.3s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=35; total time= 0.9s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=35; total time= 1.5s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=35; total time= 2.6s [CV] END max_depth=31, max_features=log2, min_samples_leaf=2, min_samples_split=35; total time= 2.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.6s [CV] END max_depth=24, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 2.6s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 1.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 2.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 0.9s [CV] END max_depth=24, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 3.1s [CV] END max_depth=24, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 3.1s [CV] END max_depth=24, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 3.1s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 1.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 3.1s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 1.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 1.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 1.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 2.3s [CV] END max_depth=31, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.5s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=31, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=31, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=31, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 2.0s [CV] END max_depth=40, max_features=None, min_samples_leaf=18, min_samples_split=33; total time= 2.8s [CV] END max_depth=40, max_features=None, min_samples_leaf=18, min_samples_split=33; total time= 2.7s [CV] END max_depth=40, max_features=None, min_samples_leaf=18, min_samples_split=33; total time= 2.7s [CV] END max_depth=40, max_features=None, min_samples_leaf=18, min_samples_split=33; total time= 2.6s [CV] END max_depth=40, max_features=None, min_samples_leaf=18, min_samples_split=33; total time= 1.7s [CV] END max_depth=22, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 1.8s [CV] END max_depth=22, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 2.0s [CV] END max_depth=22, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 1.8s [CV] END max_depth=22, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 1.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=40; total time= 1.8s [CV] END max_depth=22, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 1.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=40; total time= 2.1s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=40; total time= 2.1s [CV] END max_depth=33, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 1.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=40; total time= 2.2s [CV] END max_depth=40, max_features=None, min_samples_leaf=2, min_samples_split=40; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.7s [CV] END max_depth=33, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=20, min_samples_split=31; total time= 0.7s [CV] END max_depth=33, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.3s [CV] END max_depth=33, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.2s [CV] END max_depth=33, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 0.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 0.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 1.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 1.9s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.5s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.5s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 2.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 1.9s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 2.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 2.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 2.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 2.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 2.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 2.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 2.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 0.8s [CV] END max_depth=40, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 2.8s [CV] END max_depth=40, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 2.8s [CV] END max_depth=40, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.1s [CV] END max_depth=40, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 1.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 0.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=4, min_samples_split=33; total time= 1.0s [CV] END max_depth=40, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 2.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=12, min_samples_split=31; total time= 1.2s [CV] END max_depth=24, max_features=log2, min_samples_leaf=12, min_samples_split=31; total time= 1.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=12, min_samples_split=31; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=12, min_samples_split=31; total time= 1.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=12, min_samples_split=31; total time= 1.1s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 1.2s [CV] END max_depth=31, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 1.9s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 2.4s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=2, min_samples_split=20; total time= 0.8s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=28; total time= 0.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 2.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 2.5s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=28; total time= 1.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 2.7s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=28; total time= 1.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.3s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=28; total time= 1.4s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.5s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=16, min_samples_split=28; total time= 1.7s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.3s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.3s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 0.9s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 1.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 1.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=16, min_samples_split=40; total time= 1.9s [CV] END max_depth=24, max_features=None, min_samples_leaf=16, min_samples_split=40; total time= 2.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=16, min_samples_split=40; total time= 2.2s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 0.9s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=31; total time= 0.9s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=35; total time= 1.2s [CV] END max_depth=24, max_features=None, min_samples_leaf=16, min_samples_split=40; total time= 2.4s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=35; total time= 1.4s [CV] END max_depth=24, max_features=None, min_samples_leaf=16, min_samples_split=40; total time= 2.5s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=35; total time= 1.0s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=35; total time= 1.5s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=35; total time= 2.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 1.1s [CV] END max_depth=28, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 0.7s [CV] END max_depth=28, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 0.7s [CV] END max_depth=28, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 0.6s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.8s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.4s [CV] END max_depth=28, max_features=log2, min_samples_leaf=16, min_samples_split=28; total time= 0.8s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.8s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.6s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 0.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.1s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.6s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=4, min_samples_split=28; total time= 1.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=40; total time= 0.9s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=40; total time= 0.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=40; total time= 0.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.1s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=40; total time= 0.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=8, min_samples_split=33; total time= 0.8s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=6, min_samples_split=40; total time= 0.8s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=37; total time= 2.3s [CV] END max_depth=22, max_features=log2, min_samples_leaf=8, min_samples_split=33; total time= 1.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=33; total time= 1.7s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=33; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=8, min_samples_split=33; total time= 2.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=33; total time= 1.5s [CV] END max_depth=22, max_features=log2, min_samples_leaf=8, min_samples_split=33; total time= 2.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=33; total time= 2.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=8, min_samples_split=33; total time= 2.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=33; total time= 1.0s [CV] END max_depth=26, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 1.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 1.5s [CV] END max_depth=26, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 1.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 2.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 2.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 3.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 2.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=14, min_samples_split=24; total time= 2.9s [CV] END max_depth=35, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 1.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=37; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=37; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=37; total time= 1.1s [CV] END max_depth=35, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.2s [CV] END max_depth=35, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.2s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=37; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=37; total time= 1.1s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.0s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.2s [CV] END max_depth=35, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.8s [CV] END max_depth=35, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.2s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.6s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=40; total time= 1.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=40; total time= 1.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=40; total time= 0.7s [CV] END max_depth=35, max_features=None, min_samples_leaf=6, min_samples_split=26; total time= 2.3s [CV] END max_depth=35, max_features=None, min_samples_leaf=6, min_samples_split=26; total time= 2.4s [CV] END max_depth=35, max_features=None, min_samples_leaf=6, min_samples_split=26; total time= 2.6s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=40; total time= 0.8s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=16, min_samples_split=40; total time= 0.9s [CV] END max_depth=35, max_features=None, min_samples_leaf=6, min_samples_split=26; total time= 2.3s [CV] END max_depth=35, max_features=None, min_samples_leaf=6, min_samples_split=26; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 1.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 1.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 2.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=20; total time= 1.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=20; total time= 1.3s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=20; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=20; total time= 1.7s [CV] END max_depth=35, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=20; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=28; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=28; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=28; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=28; total time= 2.3s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=28; total time= 2.3s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=24; total time= 2.6s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=24; total time= 3.2s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=24; total time= 3.4s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=24; total time= 3.4s [CV] END max_depth=35, max_features=None, min_samples_leaf=16, min_samples_split=24; total time= 1.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.0s [CV] END max_depth=33, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=20; total time= 0.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 2.2s [CV] END max_depth=20, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 1.9s [CV] END max_depth=20, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=20; total time= 1.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=16, min_samples_split=20; total time= 1.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=20; total time= 1.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=20; total time= 1.6s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=10, min_samples_split=20; total time= 1.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=16, min_samples_split=37; total time= 0.9s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=16, min_samples_split=37; total time= 0.6s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=16, min_samples_split=37; total time= 0.7s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=16, min_samples_split=37; total time= 0.7s [CV] END max_depth=40, max_features=None, min_samples_leaf=8, min_samples_split=28; total time= 2.5s [CV] END max_depth=40, max_features=None, min_samples_leaf=8, min_samples_split=28; total time= 2.7s [CV] END max_depth=40, max_features=None, min_samples_leaf=8, min_samples_split=28; total time= 2.3s [CV] END max_depth=26, max_features=sqrt, min_samples_leaf=16, min_samples_split=37; total time= 0.7s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=33; total time= 0.6s [CV] END max_depth=40, max_features=None, min_samples_leaf=8, min_samples_split=28; total time= 2.4s [CV] END max_depth=40, max_features=None, min_samples_leaf=8, min_samples_split=28; total time= 2.6s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=33; total time= 1.3s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=33; total time= 1.4s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=12, min_samples_split=28; total time= 1.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=12, min_samples_split=28; total time= 1.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=33; total time= 1.4s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=14, min_samples_split=33; total time= 1.5s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=12, min_samples_split=28; total time= 1.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=12, min_samples_split=28; total time= 1.4s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 2.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=12, min_samples_split=28; total time= 2.2s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 2.2s [CV] END max_depth=20, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 2.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 2.4s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=33; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.5s [CV] END max_depth=20, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.7s [CV] END max_depth=20, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.3s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=10, min_samples_split=28; total time= 1.6s [CV] END max_depth=31, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 1.0s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=40; total time= 2.1s [CV] END max_depth=31, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 2.4s [CV] END max_depth=31, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 2.5s [CV] END max_depth=31, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 2.5s [CV] END max_depth=31, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 2.5s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=40; total time= 2.5s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=40; total time= 1.8s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=40; total time= 2.5s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=40; total time= 2.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 1.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 0.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 0.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=14, min_samples_split=22; total time= 0.8s [CV] END max_depth=31, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 3.2s [CV] END max_depth=31, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 3.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 3.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 3.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=2, min_samples_split=33; total time= 3.6s [CV] END max_depth=20, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 1.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 1.7s [CV] END max_depth=20, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 1.9s [CV] END max_depth=20, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.0s [CV] END max_depth=22, max_features=None, min_samples_leaf=12, min_samples_split=31; total time= 2.0s [CV] END max_depth=22, max_features=None, min_samples_leaf=12, min_samples_split=31; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.2s [CV] END max_depth=22, max_features=None, min_samples_leaf=12, min_samples_split=31; total time= 2.0s [CV] END max_depth=22, max_features=None, min_samples_leaf=12, min_samples_split=31; total time= 1.9s [CV] END max_depth=22, max_features=None, min_samples_leaf=12, min_samples_split=31; total time= 2.0s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=12, min_samples_split=24; total time= 1.8s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.6s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.7s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=12, min_samples_split=24; total time= 2.2s [CV] END max_depth=31, max_features=None, min_samples_leaf=12, min_samples_split=24; total time= 2.3s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 0.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=12, min_samples_split=24; total time= 2.4s [CV] END max_depth=31, max_features=None, min_samples_leaf=12, min_samples_split=24; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 0.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 2.6s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 2.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 1.9s [CV] END max_depth=35, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 0.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=4, min_samples_split=37; total time= 3.5s [CV] END max_depth=35, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 0.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 0.9s [CV] END max_depth=37, max_features=None, min_samples_leaf=4, min_samples_split=37; total time= 3.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=4, min_samples_split=37; total time= 4.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=4, min_samples_split=37; total time= 3.7s [CV] END max_depth=35, max_features=log2, min_samples_leaf=18, min_samples_split=28; total time= 0.8s [CV] END max_depth=37, max_features=None, min_samples_leaf=4, min_samples_split=37; total time= 4.0s [CV] END max_depth=40, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.4s [CV] END max_depth=40, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.4s [CV] END max_depth=40, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 1.5s [CV] END max_depth=40, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 0.8s [CV] END max_depth=40, max_features=log2, min_samples_leaf=16, min_samples_split=26; total time= 0.8s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.3s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 0.9s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.6s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.2s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.7s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 3.1s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.1s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 2.2s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 2.1s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=2, min_samples_split=24; total time= 1.8s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 2.3s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=2, min_samples_split=24; total time= 1.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=2, min_samples_split=24; total time= 1.7s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=2, min_samples_split=24; total time= 2.1s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=2, min_samples_split=24; total time= 1.6s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=22; total time= 2.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=22; total time= 2.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=22; total time= 2.6s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=22; total time= 2.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.4s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=14, min_samples_split=22; total time= 2.6s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.3s [CV] END max_depth=35, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 2.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 2.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.7s [CV] END max_depth=40, max_features=sqrt, min_samples_leaf=12, min_samples_split=24; total time= 2.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 2.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 2.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 2.8s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=4, min_samples_split=22; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 2.6s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 3.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 3.5s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 3.4s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 3.7s [CV] END max_depth=31, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=31; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=31; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=31; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=31; total time= 1.9s [CV] END max_depth=22, max_features=None, min_samples_leaf=8, min_samples_split=31; total time= 2.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 1.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=14, min_samples_split=26; total time= 1.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=14, min_samples_split=26; total time= 1.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=14, min_samples_split=26; total time= 1.2s [CV] END max_depth=31, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.4s [CV] END max_depth=22, max_features=log2, min_samples_leaf=14, min_samples_split=26; total time= 1.3s [CV] END max_depth=22, max_features=log2, min_samples_leaf=14, min_samples_split=26; total time= 1.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=10, min_samples_split=33; total time= 2.4s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 1.7s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=40; total time= 1.2s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=40; total time= 1.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=40; total time= 1.3s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.5s [CV] END max_depth=37, max_features=None, min_samples_leaf=16, min_samples_split=35; total time= 2.4s [CV] END max_depth=40, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 1.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=40; total time= 2.5s [CV] END max_depth=40, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 2.5s [CV] END max_depth=40, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 2.4s [CV] END max_depth=40, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 2.5s [CV] END max_depth=40, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 1.3s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=40; total time= 2.6s [CV] END max_depth=20, max_features=log2, min_samples_leaf=6, min_samples_split=28; total time= 1.3s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 1.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 2.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=6, min_samples_split=28; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=6, min_samples_split=28; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=6, min_samples_split=28; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 2.3s [CV] END max_depth=20, max_features=log2, min_samples_leaf=6, min_samples_split=28; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 2.4s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 1.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 1.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 2.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 1.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 1.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 2.1s [CV] END max_depth=35, max_features=None, min_samples_leaf=2, min_samples_split=35; total time= 3.0s [CV] END max_depth=35, max_features=None, min_samples_leaf=2, min_samples_split=35; total time= 3.0s [CV] END max_depth=35, max_features=None, min_samples_leaf=2, min_samples_split=35; total time= 2.4s [CV] END max_depth=35, max_features=None, min_samples_leaf=2, min_samples_split=35; total time= 2.4s [CV] END max_depth=35, max_features=None, min_samples_leaf=2, min_samples_split=35; total time= 2.5s [CV] END max_depth=28, max_features=None, min_samples_leaf=6, min_samples_split=40; total time= 2.4s [CV] END max_depth=28, max_features=None, min_samples_leaf=6, min_samples_split=40; total time= 2.3s [CV] END max_depth=28, max_features=None, min_samples_leaf=6, min_samples_split=40; total time= 2.2s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 0.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=6, min_samples_split=40; total time= 2.0s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.0s [CV] END max_depth=28, max_features=None, min_samples_leaf=6, min_samples_split=40; total time= 2.1s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.4s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 1.7s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.8s [CV] END max_depth=31, max_features=log2, min_samples_leaf=20, min_samples_split=22; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.5s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.4s [CV] END max_depth=26, max_features=log2, min_samples_leaf=8, min_samples_split=28; total time= 1.2s [CV] END max_depth=26, max_features=log2, min_samples_leaf=8, min_samples_split=28; total time= 0.9s [CV] END max_depth=26, max_features=log2, min_samples_leaf=8, min_samples_split=28; total time= 1.3s [CV] END max_depth=26, max_features=log2, min_samples_leaf=8, min_samples_split=28; total time= 2.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=8, min_samples_split=28; total time= 2.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=33; total time= 1.8s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=33; total time= 1.5s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=33; total time= 2.1s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=33; total time= 2.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=14, min_samples_split=33; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 1.4s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 1.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 1.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 1.9s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 1.9s [CV] END max_depth=22, max_features=log2, min_samples_leaf=16, min_samples_split=37; total time= 2.2s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 1.7s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 2.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=35; total time= 1.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.3s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.2s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.2s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.0s [CV] END max_depth=22, max_features=sqrt, min_samples_leaf=14, min_samples_split=37; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=26; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=26; total time= 0.9s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=26; total time= 1.0s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=37, max_features=None, min_samples_leaf=2, min_samples_split=22; total time= 2.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=26; total time= 1.1s [CV] END max_depth=37, max_features=log2, min_samples_leaf=6, min_samples_split=26; total time= 1.6s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.4s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.2s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.4s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.4s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=33; total time= 2.3s [CV] END max_depth=24, max_features=None, min_samples_leaf=10, min_samples_split=31; total time= 2.4s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=33; total time= 2.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=33; total time= 1.9s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=33; total time= 1.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 1.3s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 1.4s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 1.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 0.7s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=6, min_samples_split=24; total time= 1.3s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=33; total time= 2.3s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 2.1s [CV] END max_depth=33, max_features=log2, min_samples_leaf=20, min_samples_split=24; total time= 2.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 3.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 3.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 2.6s [CV] END max_depth=20, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 3.2s [CV] END max_depth=37, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.2s [CV] END max_depth=20, max_features=None, min_samples_leaf=4, min_samples_split=22; total time= 2.2s [CV] END max_depth=37, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.2s [CV] END max_depth=37, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=40; total time= 1.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=40; total time= 1.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 1.9s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=40; total time= 1.1s [CV] END max_depth=37, max_features=None, min_samples_leaf=18, min_samples_split=40; total time= 2.0s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=40; total time= 1.4s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 1.2s [CV] END max_depth=33, max_features=log2, min_samples_leaf=10, min_samples_split=40; total time= 1.5s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 1.2s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 1.4s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 0.8s [CV] END max_depth=28, max_features=log2, min_samples_leaf=14, min_samples_split=24; total time= 0.9s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 1.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 2.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=26; total time= 1.4s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=26; total time= 1.5s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 2.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=26; total time= 2.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 2.1s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=22; total time= 2.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=26; total time= 1.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=12, min_samples_split=26; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=2, min_samples_split=33; total time= 1.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=2, min_samples_split=33; total time= 2.0s [CV] END max_depth=35, max_features=log2, min_samples_leaf=2, min_samples_split=33; total time= 0.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=2, min_samples_split=33; total time= 0.8s [CV] END max_depth=35, max_features=log2, min_samples_leaf=2, min_samples_split=33; total time= 0.9s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.2s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.3s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.4s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.4s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=35; total time= 3.4s [CV] END max_depth=33, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 1.7s [CV] END max_depth=33, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 1.8s [CV] END max_depth=33, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 2.2s [CV] END max_depth=33, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=33, max_features=None, min_samples_leaf=20, min_samples_split=26; total time= 2.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=4, min_samples_split=20; total time= 2.4s [CV] END max_depth=31, max_features=None, min_samples_leaf=4, min_samples_split=20; total time= 2.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=4, min_samples_split=20; total time= 2.4s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 0.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 0.9s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 0.9s [CV] END max_depth=31, max_features=None, min_samples_leaf=4, min_samples_split=20; total time= 2.3s [CV] END max_depth=31, max_features=None, min_samples_leaf=4, min_samples_split=20; total time= 2.4s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=6, min_samples_split=26; total time= 1.5s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 1.6s [CV] END max_depth=28, max_features=sqrt, min_samples_leaf=14, min_samples_split=26; total time= 1.6s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=6, min_samples_split=26; total time= 1.6s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=6, min_samples_split=26; total time= 1.2s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=6, min_samples_split=26; total time= 1.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=37; total time= 1.0s [CV] END max_depth=35, max_features=sqrt, min_samples_leaf=6, min_samples_split=26; total time= 1.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=37; total time= 2.3s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=37; total time= 2.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=37; total time= 2.4s [CV] END max_depth=24, max_features=log2, min_samples_leaf=20, min_samples_split=37; total time= 2.4s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=20; total time= 2.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=20; total time= 2.2s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=20; total time= 1.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=20; total time= 1.9s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 1.9s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=20; total time= 2.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 2.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 2.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 2.2s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 2.2s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=8, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 1.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=28; total time= 1.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=4, min_samples_split=28; total time= 1.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=28; total time= 1.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=28; total time= 1.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=28; total time= 2.1s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=2, min_samples_split=28; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=4, min_samples_split=26; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=4, min_samples_split=26; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=4, min_samples_split=26; total time= 1.9s [CV] END max_depth=28, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=28, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.9s [CV] END max_depth=35, max_features=log2, min_samples_leaf=4, min_samples_split=26; total time= 2.1s [CV] END max_depth=28, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 2.0s [CV] END max_depth=35, max_features=log2, min_samples_leaf=4, min_samples_split=26; total time= 2.2s [CV] END max_depth=28, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.3s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.6s [CV] END max_depth=28, max_features=log2, min_samples_leaf=6, min_samples_split=22; total time= 1.8s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=26; total time= 1.5s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.7s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.8s [CV] END max_depth=31, max_features=sqrt, min_samples_leaf=14, min_samples_split=35; total time= 1.6s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=26; total time= 1.1s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=26; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=26; total time= 2.0s [CV] END max_depth=24, max_features=log2, min_samples_leaf=18, min_samples_split=26; total time= 1.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 3.0s [CV] END max_depth=40, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.9s [CV] END max_depth=40, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 3.1s [CV] END max_depth=40, max_features=None, min_samples_leaf=20, min_samples_split=22; total time= 2.5s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.0s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.0s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=35; total time= 1.3s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=35; total time= 1.2s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=35; total time= 1.3s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=35; total time= 0.8s [CV] END max_depth=20, max_features=log2, min_samples_leaf=8, min_samples_split=40; total time= 1.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=12, min_samples_split=35; total time= 1.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=8, min_samples_split=40; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=8, min_samples_split=40; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=8, min_samples_split=40; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=8, min_samples_split=40; total time= 0.8s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.6s [CV] END max_depth=28, max_features=None, min_samples_leaf=4, min_samples_split=24; total time= 2.6s [CV] END max_depth=35, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 1.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 1.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 1.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 2.1s [CV] END max_depth=35, max_features=log2, min_samples_leaf=8, min_samples_split=20; total time= 2.1s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 1.4s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 1.9s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 1.3s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 2.0s [CV] END max_depth=26, max_features=log2, min_samples_leaf=6, min_samples_split=31; total time= 2.0s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=37; total time= 1.3s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=37; total time= 0.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=37; total time= 1.8s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=37; total time= 2.1s [CV] END max_depth=20, max_features=sqrt, min_samples_leaf=18, min_samples_split=37; total time= 2.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 0.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 0.7s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 0.9s [CV] END max_depth=28, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 3.1s [CV] END max_depth=28, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 3.2s [CV] END max_depth=28, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 2.3s [CV] END max_depth=28, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 3.0s [CV] END max_depth=28, max_features=None, min_samples_leaf=10, min_samples_split=24; total time= 3.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 1.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=8, min_samples_split=22; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=33; total time= 1.0s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=33; total time= 1.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=33; total time= 1.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=33; total time= 1.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=16, min_samples_split=33; total time= 1.8s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.0s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=28; total time= 2.5s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.1s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=28; total time= 2.9s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=28; total time= 2.5s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=28; total time= 2.5s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.0s [CV] END max_depth=20, max_features=None, min_samples_leaf=12, min_samples_split=28; total time= 2.4s [CV] END max_depth=24, max_features=sqrt, min_samples_leaf=2, min_samples_split=31; total time= 1.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.4s [CV] END max_depth=26, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.4s [CV] END max_depth=26, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.6s [CV] END max_depth=26, max_features=log2, min_samples_leaf=10, min_samples_split=28; total time= 1.4s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=37; total time= 1.8s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=37; total time= 1.9s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=37; total time= 2.0s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=37; total time= 2.1s [CV] END max_depth=22, max_features=None, min_samples_leaf=14, min_samples_split=37; total time= 2.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.1s [CV] END max_depth=31, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.4s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=24; total time= 0.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=24; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=24; total time= 1.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 1.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=24; total time= 0.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=10, min_samples_split=24; total time= 1.0s [CV] END max_depth=31, max_features=None, min_samples_leaf=8, min_samples_split=24; total time= 2.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=12, min_samples_split=35; total time= 1.7s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 0.8s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 1.0s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 1.1s [CV] END max_depth=24, max_features=None, min_samples_leaf=12, min_samples_split=35; total time= 2.0s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 0.7s [CV] END max_depth=37, max_features=sqrt, min_samples_leaf=16, min_samples_split=24; total time= 0.8s [CV] END max_depth=24, max_features=None, min_samples_leaf=12, min_samples_split=35; total time= 2.3s [CV] END max_depth=24, max_features=None, min_samples_leaf=12, min_samples_split=35; total time= 2.3s [CV] END max_depth=24, max_features=None, min_samples_leaf=12, min_samples_split=35; total time= 2.4s [CV] END max_depth=24, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 1.9s [CV] END max_depth=24, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 1.9s [CV] END max_depth=24, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.0s [CV] END max_depth=24, max_features=None, min_samples_leaf=4, min_samples_split=35; total time= 2.1s [CV] END max_depth=33, max_features=None, min_samples_leaf=8, min_samples_split=20; total time= 2.0s [CV] END max_depth=33, max_features=None, min_samples_leaf=8, min_samples_split=20; total time= 2.1s [CV] END max_depth=33, max_features=None, min_samples_leaf=8, min_samples_split=20; total time= 2.2s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 0.9s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 1.0s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 1.0s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 0.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 0.9s [CV] END max_depth=33, max_features=None, min_samples_leaf=8, min_samples_split=20; total time= 2.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 0.8s [CV] END max_depth=33, max_features=sqrt, min_samples_leaf=16, min_samples_split=35; total time= 1.2s [CV] END max_depth=33, max_features=None, min_samples_leaf=8, min_samples_split=20; total time= 2.5s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.1s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.1s [CV] END max_depth=22, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 0.9s [CV] END max_depth=20, max_features=log2, min_samples_leaf=10, min_samples_split=22; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 1.6s [CV] END max_depth=22, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 1.7s [CV] END max_depth=22, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 1.5s [CV] END max_depth=22, max_features=log2, min_samples_leaf=10, min_samples_split=26; total time= 1.4s [CV] END max_depth=22, max_features=log2, min_samples_leaf=18, min_samples_split=33; total time= 0.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=18, min_samples_split=33; total time= 0.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=18, min_samples_split=33; total time= 0.7s [CV] END max_depth=26, max_features=None, min_samples_leaf=16, min_samples_split=28; total time= 2.1s [CV] END max_depth=26, max_features=None, min_samples_leaf=16, min_samples_split=28; total time= 2.1s [CV] END max_depth=26, max_features=None, min_samples_leaf=16, min_samples_split=28; total time= 2.0s [CV] END max_depth=22, max_features=log2, min_samples_leaf=18, min_samples_split=33; total time= 0.7s [CV] END max_depth=26, max_features=None, min_samples_leaf=16, min_samples_split=28; total time= 1.8s [CV] END max_depth=22, max_features=log2, min_samples_leaf=18, min_samples_split=33; total time= 0.8s [CV] END max_depth=26, max_features=None, min_samples_leaf=16, min_samples_split=28; total time= 1.9s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.8s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.8s [CV] END max_depth=24, max_features=None, min_samples_leaf=18, min_samples_split=22; total time= 1.7s [CV] END max_depth=24, max_features=None, min_samples_leaf=18, min_samples_split=22; total time= 1.6s [CV] END max_depth=24, max_features=None, min_samples_leaf=18, min_samples_split=22; total time= 1.8s [CV] END max_depth=24, max_features=None, min_samples_leaf=18, min_samples_split=22; total time= 1.6s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.5s [CV] END max_depth=37, max_features=log2, min_samples_leaf=14, min_samples_split=28; total time= 0.6s [CV] END max_depth=24, max_features=None, min_samples_leaf=18, min_samples_split=22; total time= 1.6s
{'min_samples_split': 22,
'min_samples_leaf': 2,
'max_features': None,
'max_depth': 40}
decision_tree_best_params = {'min_samples_split': 22,
'min_samples_leaf': 2,
'max_features': None,
'max_depth': 40}
from sklearn.tree import DecisionTreeClassifier
decision_tree_class_optimized = DecisionTreeClassifier(random_state=42)
decision_tree_class_optimized.set_params(**decision_tree_best_params)
decision_tree_class_optimized.fit(X_train, y_train)
decision_tree_optimized_model_scores = compute_performance([decision_tree_class_optimized], X_train, y_train, scoring_funs)
decision_tree_optimized_model_scores
std of 0.7785786282978886 of type accuracy is 0.001580751578142394 std of 0.7925699304362881 of type precision is 0.002096575650643993 std of 0.7663060950238333 of type recall is 0.004147805112973715
{'DecisionTreeClassifier': {'accuracy': 0.7785786282978886,
'precision': 0.7925699304362881,
'recall': 0.7663060950238333}}
Com'era stato presupposto gli score sono peggiorati.
Il modello potrebbe essere costruito combinando i classificatori precedentemente addestrati, se non fosse per il fatto che l'eterogeneità del test set utilizzato da quest'ultimi renda inagevole la costruzione del modello, ovverossia, nel momento in cui l'ensemble dovrá effettuare delle predizioni, a ciascun modello nell'ensemble dovra' essere associato il rispettivo test set, che sia quello normalizzato per k-neighbors, quello standardizzato per SGD, e quello normale per i rimanenti classificatori. Questo non e' possibile con gli strumenti messi a disposizione da scikit learn; le predizioni verrebbero fatte solo su una tipologia di test set. Come strada alternativa si potrebbe costruire un bagging con k-neighbors, essendo quello piu' promettente dai risultati precedenti; l'idea ha avuto breve vita in seguito alla considerazione dei tempi di addestramento dell'ipotetico modello; k-neighbors da solo esibisce dei tempi di apprendimento troppo elevati, la situazione non farebbe che peggiorare con l'addestramento di piu'k-neighbors in contemporanea. Effettuare il bagging sui restanti modelli, quali SGD e naive bayes, avrebbe poco senso essendo dei modelli relativamente stabili, cioe' costruendo modelli diversi, ma della stessa tipologia (naive bayes ad esempio), le predizioni sarebbe in gran parte uguali, ergo il voting non aiuterebbe. Se con il bagging non abbiamo avuto successo, potremmo provare con il boosting, in particolare effettuando il training del classificatore XGBoost.
XGBoost, che sta per Extreme Gradient Boosting, è una libreria di apprendimento automatico scalabile che fa uso di ensemble di alberi decisionali tramite la tecnica di boosting, alla quale viene applicata una altamente ottimizzata della discesa del gradiente. Prima di entrare nel dettaglio della tecnica di classificazione che verra' adoperata, e' necessario introdurre i due concetti fondamentali sui cui essa basa il proprio funzionamento, vale a dire la tecnica di boosting e gradient-boosting. Il boosting e' una tecnica di ensemble che mira a ridurre il bias e la varianza dei modelli, trasformando weak learners in strong learners, dove per weak learners si intendono dei modelli le cui performance in termini di accuratezza siano appena superiori a quelli di un classificatore che effettua delle predizioni in modo randomico. Il boosting procede in maniera sequenziale : ad ogni iterazione viene addestrato un modello su un insieme di dati; in seguito a ciascun elemento vengono associati dei pesi, prediligendo dei valori alti per le istanze non classificate correttamente,in modo che il classificatore successivo presti maggiore attenzione nella loro corretta classificazione. Il gradient boosting è un'estensione di questa tecnica, in cui il processo di generazione additiva di weak learners è formalizzato come un algoritmo di discesa del gradiente su una funzione obiettivo.
Terminata la breve introduzione alle tecnica, come di consueto, valutiamo le performance di base del classificatore. Una precisazione : anche sklearn mette a disposizione un XGBoostClassifier, tuttavia e' stato osservato che il classificatore della libreria xgboot e' decisamente piu' efficiente, in termini di tempo di esecuzione e memoria occupata; pertanto e' stato preferita questa libreria.
from xgboost import XGBClassifier
xgb_class = XGBClassifier(nthread=8)
xgb_class.fit(X_train,y_train)
xgb_model_scores = compute_performance([xgb_class], X_train, y_train, scoring_funs)
std of 0.8009203301511946 of type accuracy is 0.0023834239461665912 std of 0.8155473232048797 of type precision is 0.0031810459214210744 std of 0.7877337907161346 of type recall is 0.004082406542289643
xgb_class.get_params()
{'objective': 'binary:logistic',
'use_label_encoder': False,
'base_score': 0.5,
'booster': 'gbtree',
'callbacks': None,
'colsample_bylevel': 1,
'colsample_bynode': 1,
'colsample_bytree': 1,
'early_stopping_rounds': None,
'enable_categorical': False,
'eval_metric': None,
'gamma': 0,
'gpu_id': -1,
'grow_policy': 'depthwise',
'importance_type': None,
'interaction_constraints': '',
'learning_rate': 0.300000012,
'max_bin': 256,
'max_cat_to_onehot': 4,
'max_delta_step': 0,
'max_depth': 6,
'max_leaves': 0,
'min_child_weight': 1,
'missing': nan,
'monotone_constraints': '()',
'n_estimators': 100,
'n_jobs': 0,
'num_parallel_tree': 1,
'predictor': 'auto',
'random_state': 0,
'reg_alpha': 0,
'reg_lambda': 1,
'sampling_method': 'uniform',
'scale_pos_weight': 1,
'subsample': 1,
'tree_method': 'exact',
'validate_parameters': 1,
'verbosity': None}
Il numero di parametri e' abbastanza sproporzionato se paragonato ai precedenti modelli. Gli autori di XGBoost hanno diviso queste insieme in 3 cateogrie, rispettivamente
Di questa categoria fanno parte
Essendo elevato il numero di iperparametri, vi si concentrera' solo su quelli del modello che verra' scelto, i.e. gbtree. La scelta di quest'ultimo come modello rispetto a gblinear e' stata guidata da osservazioni sperimentali che dimostrano, nella maggior parte dei casi, come gbtree performi meglio rispetto ai modelli lineari.
Applichiamo una fase di scrematura degli iperparametri : scale_pos in questo contesto puo' essere tralasciato poiche' il dataset non e' sbilanciato, lambda e' possibile tralasciarlo in quanto abbiamo altri parametri che permettono di controllare l'overfitting, max_delta step non e' utilizzato in generale, colsample_bylevel puo' essere tralasciato essendo sufficienti subsample e colsample_bytree, max_leaf_nodes non avrebbe senso utilizzarlo avendo gia' scelto max_depth (se specificassimo un valore di max_leaf_nodes, max_depth verrebbe ignorato). In sintesi, gli iperparametri che verranno ottimizzati saranno
booster_most_imp_params = {'learning_rate', 'min_child_weight', 'max_depth', 'max_depth', 'gamma', 'subsample', 'colsample_bytree'}
Questi parametri vengono utilizzati per definire l'obiettivo di ottimizzazione della metrica da calcolare ad ogni passaggio.
Il compito affrontato in questa sede e' quello della classificazione binaria, ergo per la funzione obiettivo verra' utilizzata la regressione logistica, mentre per la metrica di valutazione l'errore di classificazione binario.
Come di routine, preliminare alla fase di tuning, grafichiamo l'andamento delle performance del classificatore al variare di un iperparametro per cercare di inferire un range di variazione dell'iperparametro. Chiaramente, essendo molti gli iperparametri, verrano graficati solo quelli piu' influenti nelle performance.
max_depth_values = range(3,30,2)
for fun in scoring_funs:
plot_classifier_performance('max_depth', max_depth_values, XGBClassifier(), X_train, y_train, fun)
Come ormai consolidato dai modelli tree based, far crescere gli alberi in profondita' non puo' fare che migliorare le performance dei classificatori (sebbene tale crescita debbe essere moderata). Per valori maggiori di 15 il grafico tende ad assestarsi. Un valore pari a 13 potrebbe essere sufficiente.
learning_rate_values = [0.01, 0.03, 0.06, 0.09, 0.1, 0.12, 0.15, 0.18, 0.2, 0.21, 0.23, 0.24, 0.27, 0.3, 0.3]
for fun in scoring_funs:
plot_classifier_performance('learning_rate', learning_rate_values, XGBClassifier(), X_train, y_train, fun)
L'andamento esibito dalla tripla di grafici e' approssimatamente lineare. Valori bassi non sembrano beneficiare il modello; non resta che scegliere come valore ottimale quello di default (che e' proprio 0.3).
n_estimators_values = range(100,300,15)
for fun in scoring_funs:
plot_classifier_performance('n_estimators', n_estimators_values, XGBClassifier(), X_train, y_train, fun)
Analogo ai grafici precedenti, l'andamento lineare e' decisamente piu' accentuato. Se non per la limitatezza dell'hardware, il valore scelte sarebbe stato 300, tuttavia l'intensivo uso della CPU da parte dell'algoritmo (con le relative conseguenze) rende inattuabile la scelta di valori alti dell'iperparametro. Accettando dei compromessi, si potrebbe scegliere poco piu' del valore medio, ad esempio 200.
Rispetto alla fase di tuning dei modelli precdenti, si e' deciso di adottare un altro approccio in seguito alla presa visione dell'utilizzo sproporzionato di risorse dell'algoritmo (comparato ai modelli precedenti) che consiste nell'effettuare il tuning in piu' fasi, a ciascuna corrispondente un sottoinsieme degli iperparametri piu' importanti. In questa prima fase, tralasciando n_estimators, vengono presi in considerazione gli iperparametri che impatteranno maggioramente le performance del modello, rispettivamente max_depth e min_child_weight. Un notevole svantaggio di questo metodo e' la rinuncia a un massimo globale (in termini di performance) per un massimo locale : se avessimo considerato una griglia comprendente tutti gli iperparametri sarebbe stato certamente possibile individuare la migliore combinazione, mentre con questo nuovo approccio i valori finali potrebbero non corrispondere alla combinazione ottimale.
xgb_class_grid_params_1 = {
'n_estimators': 200,
'max_depth': range(5,15,2),
'min_child_weight':range(1,6,1)
}
max_depth potrebbe essere impostato a 13, come discusso precedentemente, tuttavia la presenza di min_child_weight potrebbe influenzare la scelta su max_depth, ergo potremmo avere max_depth piu' bassi in corrispondenza di un certo valore di min_child_weight che potrebbe risultare in score migliori; per tale motivo si e' preferito far variare max_depth in un range di valori.
from sklearn.model_selection import GridSearchCV
xgb_class_1 = XGBClassifier()
grid_search_xgb_1 = GridSearchCV(xgb_class_1, xgb_class_grid_params_1, verbose=3)
grid_search_xgb_1.fit(X_train,y_train)
grid_search_xgb_1.best_params_
Fitting 5 folds for each of 25 candidates, totalling 125 fits [CV 1/5] END ...max_depth=5, min_child_weight=1;, score=0.783 total time= 2.7s [CV 2/5] END ...max_depth=5, min_child_weight=1;, score=0.785 total time= 2.7s [CV 3/5] END ...max_depth=5, min_child_weight=1;, score=0.790 total time= 2.6s [CV 4/5] END ...max_depth=5, min_child_weight=1;, score=0.786 total time= 2.7s [CV 5/5] END ...max_depth=5, min_child_weight=1;, score=0.783 total time= 2.7s [CV 1/5] END ...max_depth=5, min_child_weight=2;, score=0.780 total time= 2.6s [CV 2/5] END ...max_depth=5, min_child_weight=2;, score=0.787 total time= 2.7s [CV 3/5] END ...max_depth=5, min_child_weight=2;, score=0.792 total time= 2.7s [CV 4/5] END ...max_depth=5, min_child_weight=2;, score=0.785 total time= 2.6s [CV 5/5] END ...max_depth=5, min_child_weight=2;, score=0.785 total time= 2.7s [CV 1/5] END ...max_depth=5, min_child_weight=3;, score=0.783 total time= 2.6s [CV 2/5] END ...max_depth=5, min_child_weight=3;, score=0.783 total time= 2.6s [CV 3/5] END ...max_depth=5, min_child_weight=3;, score=0.787 total time= 2.7s [CV 4/5] END ...max_depth=5, min_child_weight=3;, score=0.786 total time= 2.7s [CV 5/5] END ...max_depth=5, min_child_weight=3;, score=0.784 total time= 2.8s [CV 1/5] END ...max_depth=5, min_child_weight=4;, score=0.784 total time= 2.7s [CV 2/5] END ...max_depth=5, min_child_weight=4;, score=0.784 total time= 2.7s [CV 3/5] END ...max_depth=5, min_child_weight=4;, score=0.789 total time= 2.7s [CV 4/5] END ...max_depth=5, min_child_weight=4;, score=0.787 total time= 2.7s [CV 5/5] END ...max_depth=5, min_child_weight=4;, score=0.787 total time= 2.7s [CV 1/5] END ...max_depth=5, min_child_weight=5;, score=0.784 total time= 2.7s [CV 2/5] END ...max_depth=5, min_child_weight=5;, score=0.787 total time= 2.7s [CV 3/5] END ...max_depth=5, min_child_weight=5;, score=0.792 total time= 2.7s [CV 4/5] END ...max_depth=5, min_child_weight=5;, score=0.787 total time= 2.7s [CV 5/5] END ...max_depth=5, min_child_weight=5;, score=0.788 total time= 2.8s [CV 1/5] END ...max_depth=7, min_child_weight=1;, score=0.811 total time= 4.0s [CV 2/5] END ...max_depth=7, min_child_weight=1;, score=0.818 total time= 3.9s [CV 3/5] END ...max_depth=7, min_child_weight=1;, score=0.817 total time= 3.8s [CV 4/5] END ...max_depth=7, min_child_weight=1;, score=0.818 total time= 3.8s [CV 5/5] END ...max_depth=7, min_child_weight=1;, score=0.814 total time= 3.8s [CV 1/5] END ...max_depth=7, min_child_weight=2;, score=0.815 total time= 3.8s [CV 2/5] END ...max_depth=7, min_child_weight=2;, score=0.814 total time= 3.8s [CV 3/5] END ...max_depth=7, min_child_weight=2;, score=0.817 total time= 3.8s [CV 4/5] END ...max_depth=7, min_child_weight=2;, score=0.811 total time= 3.8s [CV 5/5] END ...max_depth=7, min_child_weight=2;, score=0.809 total time= 3.8s [CV 1/5] END ...max_depth=7, min_child_weight=3;, score=0.812 total time= 3.8s [CV 2/5] END ...max_depth=7, min_child_weight=3;, score=0.816 total time= 3.8s [CV 3/5] END ...max_depth=7, min_child_weight=3;, score=0.813 total time= 3.8s [CV 4/5] END ...max_depth=7, min_child_weight=3;, score=0.813 total time= 3.8s [CV 5/5] END ...max_depth=7, min_child_weight=3;, score=0.815 total time= 3.7s [CV 1/5] END ...max_depth=7, min_child_weight=4;, score=0.810 total time= 3.7s [CV 2/5] END ...max_depth=7, min_child_weight=4;, score=0.810 total time= 3.8s [CV 3/5] END ...max_depth=7, min_child_weight=4;, score=0.815 total time= 3.8s [CV 4/5] END ...max_depth=7, min_child_weight=4;, score=0.812 total time= 3.7s [CV 5/5] END ...max_depth=7, min_child_weight=4;, score=0.812 total time= 3.7s [CV 1/5] END ...max_depth=7, min_child_weight=5;, score=0.810 total time= 4.1s [CV 2/5] END ...max_depth=7, min_child_weight=5;, score=0.809 total time= 3.8s [CV 3/5] END ...max_depth=7, min_child_weight=5;, score=0.811 total time= 4.2s [CV 4/5] END ...max_depth=7, min_child_weight=5;, score=0.808 total time= 4.3s [CV 5/5] END ...max_depth=7, min_child_weight=5;, score=0.807 total time= 3.8s [CV 1/5] END ...max_depth=9, min_child_weight=1;, score=0.843 total time= 5.1s [CV 2/5] END ...max_depth=9, min_child_weight=1;, score=0.844 total time= 6.0s [CV 3/5] END ...max_depth=9, min_child_weight=1;, score=0.847 total time= 7.7s [CV 4/5] END ...max_depth=9, min_child_weight=1;, score=0.847 total time= 7.1s [CV 5/5] END ...max_depth=9, min_child_weight=1;, score=0.842 total time= 8.0s [CV 1/5] END ...max_depth=9, min_child_weight=2;, score=0.839 total time= 5.6s [CV 2/5] END ...max_depth=9, min_child_weight=2;, score=0.840 total time= 5.1s [CV 3/5] END ...max_depth=9, min_child_weight=2;, score=0.840 total time= 5.7s [CV 4/5] END ...max_depth=9, min_child_weight=2;, score=0.842 total time= 6.2s [CV 5/5] END ...max_depth=9, min_child_weight=2;, score=0.838 total time= 7.0s [CV 1/5] END ...max_depth=9, min_child_weight=3;, score=0.836 total time= 5.9s [CV 2/5] END ...max_depth=9, min_child_weight=3;, score=0.835 total time= 5.3s [CV 3/5] END ...max_depth=9, min_child_weight=3;, score=0.840 total time= 5.1s [CV 4/5] END ...max_depth=9, min_child_weight=3;, score=0.838 total time= 5.4s [CV 5/5] END ...max_depth=9, min_child_weight=3;, score=0.835 total time= 4.9s [CV 1/5] END ...max_depth=9, min_child_weight=4;, score=0.830 total time= 5.2s [CV 2/5] END ...max_depth=9, min_child_weight=4;, score=0.834 total time= 5.6s [CV 3/5] END ...max_depth=9, min_child_weight=4;, score=0.834 total time= 5.4s [CV 4/5] END ...max_depth=9, min_child_weight=4;, score=0.832 total time= 5.7s [CV 5/5] END ...max_depth=9, min_child_weight=4;, score=0.832 total time= 5.3s [CV 1/5] END ...max_depth=9, min_child_weight=5;, score=0.833 total time= 5.0s [CV 2/5] END ...max_depth=9, min_child_weight=5;, score=0.834 total time= 5.1s [CV 3/5] END ...max_depth=9, min_child_weight=5;, score=0.833 total time= 5.5s [CV 4/5] END ...max_depth=9, min_child_weight=5;, score=0.834 total time= 4.9s [CV 5/5] END ...max_depth=9, min_child_weight=5;, score=0.834 total time= 5.3s [CV 1/5] END ..max_depth=11, min_child_weight=1;, score=0.862 total time= 7.6s [CV 2/5] END ..max_depth=11, min_child_weight=1;, score=0.858 total time= 6.7s [CV 3/5] END ..max_depth=11, min_child_weight=1;, score=0.862 total time= 7.7s [CV 4/5] END ..max_depth=11, min_child_weight=1;, score=0.860 total time= 7.1s [CV 5/5] END ..max_depth=11, min_child_weight=1;, score=0.863 total time= 7.2s [CV 1/5] END ..max_depth=11, min_child_weight=2;, score=0.850 total time= 6.8s [CV 2/5] END ..max_depth=11, min_child_weight=2;, score=0.857 total time= 6.8s [CV 3/5] END ..max_depth=11, min_child_weight=2;, score=0.857 total time= 6.5s [CV 4/5] END ..max_depth=11, min_child_weight=2;, score=0.860 total time= 6.4s [CV 5/5] END ..max_depth=11, min_child_weight=2;, score=0.855 total time= 7.5s [CV 1/5] END ..max_depth=11, min_child_weight=3;, score=0.853 total time= 6.8s [CV 2/5] END ..max_depth=11, min_child_weight=3;, score=0.850 total time= 6.5s [CV 3/5] END ..max_depth=11, min_child_weight=3;, score=0.856 total time= 6.3s [CV 4/5] END ..max_depth=11, min_child_weight=3;, score=0.854 total time= 6.9s [CV 5/5] END ..max_depth=11, min_child_weight=3;, score=0.852 total time= 6.4s [CV 1/5] END ..max_depth=11, min_child_weight=4;, score=0.848 total time= 6.1s [CV 2/5] END ..max_depth=11, min_child_weight=4;, score=0.846 total time= 6.3s [CV 3/5] END ..max_depth=11, min_child_weight=4;, score=0.847 total time= 6.4s [CV 4/5] END ..max_depth=11, min_child_weight=4;, score=0.848 total time= 7.4s [CV 5/5] END ..max_depth=11, min_child_weight=4;, score=0.848 total time= 7.4s [CV 1/5] END ..max_depth=11, min_child_weight=5;, score=0.851 total time= 6.6s [CV 2/5] END ..max_depth=11, min_child_weight=5;, score=0.848 total time= 6.2s [CV 3/5] END ..max_depth=11, min_child_weight=5;, score=0.847 total time= 6.8s [CV 4/5] END ..max_depth=11, min_child_weight=5;, score=0.850 total time= 6.2s [CV 5/5] END ..max_depth=11, min_child_weight=5;, score=0.842 total time= 6.1s [CV 1/5] END ..max_depth=13, min_child_weight=1;, score=0.869 total time= 10.4s [CV 2/5] END ..max_depth=13, min_child_weight=1;, score=0.869 total time= 8.2s [CV 3/5] END ..max_depth=13, min_child_weight=1;, score=0.878 total time= 8.3s [CV 4/5] END ..max_depth=13, min_child_weight=1;, score=0.870 total time= 9.2s [CV 5/5] END ..max_depth=13, min_child_weight=1;, score=0.873 total time= 8.2s [CV 1/5] END ..max_depth=13, min_child_weight=2;, score=0.864 total time= 8.0s [CV 2/5] END ..max_depth=13, min_child_weight=2;, score=0.866 total time= 8.2s [CV 3/5] END ..max_depth=13, min_child_weight=2;, score=0.871 total time= 7.8s [CV 4/5] END ..max_depth=13, min_child_weight=2;, score=0.870 total time= 8.2s [CV 5/5] END ..max_depth=13, min_child_weight=2;, score=0.866 total time= 7.7s [CV 1/5] END ..max_depth=13, min_child_weight=3;, score=0.860 total time= 7.7s [CV 2/5] END ..max_depth=13, min_child_weight=3;, score=0.859 total time= 7.8s [CV 3/5] END ..max_depth=13, min_child_weight=3;, score=0.865 total time= 7.9s [CV 4/5] END ..max_depth=13, min_child_weight=3;, score=0.866 total time= 7.5s [CV 5/5] END ..max_depth=13, min_child_weight=3;, score=0.859 total time= 7.5s [CV 1/5] END ..max_depth=13, min_child_weight=4;, score=0.860 total time= 7.5s [CV 2/5] END ..max_depth=13, min_child_weight=4;, score=0.859 total time= 7.6s [CV 3/5] END ..max_depth=13, min_child_weight=4;, score=0.858 total time= 7.5s [CV 4/5] END ..max_depth=13, min_child_weight=4;, score=0.862 total time= 9.0s [CV 5/5] END ..max_depth=13, min_child_weight=4;, score=0.860 total time= 7.9s [CV 1/5] END ..max_depth=13, min_child_weight=5;, score=0.859 total time= 7.7s [CV 2/5] END ..max_depth=13, min_child_weight=5;, score=0.858 total time= 7.6s [CV 3/5] END ..max_depth=13, min_child_weight=5;, score=0.854 total time= 7.4s [CV 4/5] END ..max_depth=13, min_child_weight=5;, score=0.859 total time= 7.8s [CV 5/5] END ..max_depth=13, min_child_weight=5;, score=0.859 total time= 7.8s
{'max_depth': 13, 'min_child_weight': 1}
xgb_best_params_1 = {'max_depth': 13, 'min_child_weight': 1}
xgb_class_optimized_1 = XGBClassifier()
xgb_class_optimized_1.set_params(**xgb_best_params_1)
xgb_class_optimized_1.fit(X_train, y_train)
xgb_class_optimized_1_model_scores = compute_performance([xgb_class_optimized_1], X_train, y_train, scoring_funs)
xgb_class_optimized_1_model_scores
std of 0.8718379840437777 of type accuracy is 0.0032432445205672423 std of 0.8783261959432238 of type precision is 0.0032467297053858347 std of 0.8690391490239724 of type recall is 0.005303533313993299
{'XGBClassifier': {'accuracy': 0.8718379840437777,
'precision': 0.8783261959432238,
'recall': 0.8690391490239724}}
Fortunamente le performance hanno gia' subito un incremento osservabile. Procediamo ora con la seconda fase di tuning, tenendo a mente i valori ottimali degli iperparametri appena ricavati.
gamma_values = [i/20.0 for i in range(1,30)]
for fun in scoring_funs:
plot_classifier_performance('gamma', gamma_values, XGBClassifier(max_depth=13, min_child_weight=1), X_train, y_train, fun)
Variazione crescenti dell'iperapametro non favoriscono le performance del modello. Questo significa che il criterio di splitting di un nodo e' molto sensibile a piccole riduzioni della funzione obiettivo. Gamma = 0.2 e' il valore che verra' scelto.
Gli iperparametri protagonisti di questa seconda fase di tuning saranno subsample e colsample_bytree. Come specificato precedentemente sono utilizzati per ridurre l'overfitting, specialmente valori bassi. Empiricamente e' stato osservato che l'intervallo di variazione comune per ambedue gli iperparametri e' [0.5,1].
xgb_class_grid_params_2 = {
'subsample': [i/20.0 for i in range(10,20)],
'colsample_bytree': [i/20.0 for i in range(10,20)],
}
from sklearn.model_selection import RandomizedSearchCV
xgb_class_2 = XGBClassifier(max_depth=13, min_child_weight=1, gamma=0.2)
grid_search_xgb_2 = GridSearchCV(xgb_class_2, xgb_class_grid_params_2, verbose=3)
grid_search_xgb_2.fit(X_train,y_train)
grid_search_xgb_2.best_params_
Fitting 5 folds for each of 100 candidates, totalling 500 fits [CV 1/5] END colsample_bytree=0.5, subsample=0.5;, score=0.843 total time= 6.6s [CV 2/5] END colsample_bytree=0.5, subsample=0.5;, score=0.843 total time= 6.4s [CV 3/5] END colsample_bytree=0.5, subsample=0.5;, score=0.853 total time= 6.4s [CV 4/5] END colsample_bytree=0.5, subsample=0.5;, score=0.847 total time= 6.8s [CV 5/5] END colsample_bytree=0.5, subsample=0.5;, score=0.849 total time= 6.8s [CV 1/5] END colsample_bytree=0.5, subsample=0.55;, score=0.844 total time= 6.5s [CV 2/5] END colsample_bytree=0.5, subsample=0.55;, score=0.847 total time= 6.5s [CV 3/5] END colsample_bytree=0.5, subsample=0.55;, score=0.856 total time= 6.5s [CV 4/5] END colsample_bytree=0.5, subsample=0.55;, score=0.855 total time= 6.5s [CV 5/5] END colsample_bytree=0.5, subsample=0.55;, score=0.852 total time= 6.4s [CV 1/5] END colsample_bytree=0.5, subsample=0.6;, score=0.852 total time= 6.5s [CV 2/5] END colsample_bytree=0.5, subsample=0.6;, score=0.853 total time= 6.4s [CV 3/5] END colsample_bytree=0.5, subsample=0.6;, score=0.854 total time= 6.4s [CV 4/5] END colsample_bytree=0.5, subsample=0.6;, score=0.854 total time= 6.4s [CV 5/5] END colsample_bytree=0.5, subsample=0.6;, score=0.855 total time= 6.5s [CV 1/5] END colsample_bytree=0.5, subsample=0.65;, score=0.852 total time= 6.4s [CV 2/5] END colsample_bytree=0.5, subsample=0.65;, score=0.859 total time= 6.4s [CV 3/5] END colsample_bytree=0.5, subsample=0.65;, score=0.862 total time= 6.4s [CV 4/5] END colsample_bytree=0.5, subsample=0.65;, score=0.855 total time= 6.4s [CV 5/5] END colsample_bytree=0.5, subsample=0.65;, score=0.853 total time= 6.4s [CV 1/5] END colsample_bytree=0.5, subsample=0.7;, score=0.858 total time= 6.3s [CV 2/5] END colsample_bytree=0.5, subsample=0.7;, score=0.855 total time= 6.4s [CV 3/5] END colsample_bytree=0.5, subsample=0.7;, score=0.865 total time= 6.3s [CV 4/5] END colsample_bytree=0.5, subsample=0.7;, score=0.862 total time= 6.3s [CV 5/5] END colsample_bytree=0.5, subsample=0.7;, score=0.858 total time= 6.3s [CV 1/5] END colsample_bytree=0.5, subsample=0.75;, score=0.859 total time= 6.2s [CV 2/5] END colsample_bytree=0.5, subsample=0.75;, score=0.859 total time= 6.2s [CV 3/5] END colsample_bytree=0.5, subsample=0.75;, score=0.864 total time= 6.2s [CV 4/5] END colsample_bytree=0.5, subsample=0.75;, score=0.861 total time= 6.3s [CV 5/5] END colsample_bytree=0.5, subsample=0.75;, score=0.862 total time= 6.3s [CV 1/5] END colsample_bytree=0.5, subsample=0.8;, score=0.856 total time= 6.3s [CV 2/5] END colsample_bytree=0.5, subsample=0.8;, score=0.861 total time= 6.2s [CV 3/5] END colsample_bytree=0.5, subsample=0.8;, score=0.867 total time= 6.2s [CV 4/5] END colsample_bytree=0.5, subsample=0.8;, score=0.865 total time= 6.2s [CV 5/5] END colsample_bytree=0.5, subsample=0.8;, score=0.862 total time= 6.2s [CV 1/5] END colsample_bytree=0.5, subsample=0.85;, score=0.860 total time= 6.1s [CV 2/5] END colsample_bytree=0.5, subsample=0.85;, score=0.861 total time= 6.1s [CV 3/5] END colsample_bytree=0.5, subsample=0.85;, score=0.863 total time= 6.2s [CV 4/5] END colsample_bytree=0.5, subsample=0.85;, score=0.863 total time= 6.1s [CV 5/5] END colsample_bytree=0.5, subsample=0.85;, score=0.862 total time= 6.1s [CV 1/5] END colsample_bytree=0.5, subsample=0.9;, score=0.861 total time= 6.0s [CV 2/5] END colsample_bytree=0.5, subsample=0.9;, score=0.860 total time= 6.0s [CV 3/5] END colsample_bytree=0.5, subsample=0.9;, score=0.869 total time= 6.0s [CV 4/5] END colsample_bytree=0.5, subsample=0.9;, score=0.866 total time= 6.0s [CV 5/5] END colsample_bytree=0.5, subsample=0.9;, score=0.863 total time= 6.0s [CV 1/5] END colsample_bytree=0.5, subsample=0.95;, score=0.859 total time= 6.0s [CV 2/5] END colsample_bytree=0.5, subsample=0.95;, score=0.865 total time= 6.0s [CV 3/5] END colsample_bytree=0.5, subsample=0.95;, score=0.867 total time= 6.0s [CV 4/5] END colsample_bytree=0.5, subsample=0.95;, score=0.863 total time= 6.2s [CV 5/5] END colsample_bytree=0.5, subsample=0.95;, score=0.864 total time= 6.0s [CV 1/5] END colsample_bytree=0.55, subsample=0.5;, score=0.851 total time= 6.7s [CV 2/5] END colsample_bytree=0.55, subsample=0.5;, score=0.844 total time= 6.6s [CV 3/5] END colsample_bytree=0.55, subsample=0.5;, score=0.858 total time= 6.6s [CV 4/5] END colsample_bytree=0.55, subsample=0.5;, score=0.854 total time= 6.7s [CV 5/5] END colsample_bytree=0.55, subsample=0.5;, score=0.847 total time= 6.6s [CV 1/5] END colsample_bytree=0.55, subsample=0.55;, score=0.847 total time= 6.7s [CV 2/5] END colsample_bytree=0.55, subsample=0.55;, score=0.850 total time= 6.8s [CV 3/5] END colsample_bytree=0.55, subsample=0.55;, score=0.857 total time= 6.7s [CV 4/5] END colsample_bytree=0.55, subsample=0.55;, score=0.854 total time= 6.7s [CV 5/5] END colsample_bytree=0.55, subsample=0.55;, score=0.852 total time= 6.8s [CV 1/5] END colsample_bytree=0.55, subsample=0.6;, score=0.851 total time= 6.8s [CV 2/5] END colsample_bytree=0.55, subsample=0.6;, score=0.849 total time= 6.7s [CV 3/5] END colsample_bytree=0.55, subsample=0.6;, score=0.860 total time= 6.7s [CV 4/5] END colsample_bytree=0.55, subsample=0.6;, score=0.855 total time= 6.7s [CV 5/5] END colsample_bytree=0.55, subsample=0.6;, score=0.853 total time= 6.7s [CV 1/5] END colsample_bytree=0.55, subsample=0.65;, score=0.853 total time= 6.7s [CV 2/5] END colsample_bytree=0.55, subsample=0.65;, score=0.856 total time= 6.8s [CV 3/5] END colsample_bytree=0.55, subsample=0.65;, score=0.863 total time= 6.7s [CV 4/5] END colsample_bytree=0.55, subsample=0.65;, score=0.862 total time= 6.7s [CV 5/5] END colsample_bytree=0.55, subsample=0.65;, score=0.857 total time= 6.7s [CV 1/5] END colsample_bytree=0.55, subsample=0.7;, score=0.860 total time= 6.6s [CV 2/5] END colsample_bytree=0.55, subsample=0.7;, score=0.859 total time= 6.6s [CV 3/5] END colsample_bytree=0.55, subsample=0.7;, score=0.863 total time= 6.6s [CV 4/5] END colsample_bytree=0.55, subsample=0.7;, score=0.862 total time= 6.7s [CV 5/5] END colsample_bytree=0.55, subsample=0.7;, score=0.860 total time= 6.9s [CV 1/5] END colsample_bytree=0.55, subsample=0.75;, score=0.856 total time= 6.5s [CV 2/5] END colsample_bytree=0.55, subsample=0.75;, score=0.859 total time= 6.5s [CV 3/5] END colsample_bytree=0.55, subsample=0.75;, score=0.865 total time= 6.4s [CV 4/5] END colsample_bytree=0.55, subsample=0.75;, score=0.868 total time= 6.5s [CV 5/5] END colsample_bytree=0.55, subsample=0.75;, score=0.865 total time= 6.5s [CV 1/5] END colsample_bytree=0.55, subsample=0.8;, score=0.862 total time= 6.5s [CV 2/5] END colsample_bytree=0.55, subsample=0.8;, score=0.862 total time= 6.4s [CV 3/5] END colsample_bytree=0.55, subsample=0.8;, score=0.868 total time= 6.4s [CV 4/5] END colsample_bytree=0.55, subsample=0.8;, score=0.868 total time= 6.4s [CV 5/5] END colsample_bytree=0.55, subsample=0.8;, score=0.866 total time= 6.4s [CV 1/5] END colsample_bytree=0.55, subsample=0.85;, score=0.863 total time= 6.3s [CV 2/5] END colsample_bytree=0.55, subsample=0.85;, score=0.864 total time= 6.3s [CV 3/5] END colsample_bytree=0.55, subsample=0.85;, score=0.867 total time= 6.3s [CV 4/5] END colsample_bytree=0.55, subsample=0.85;, score=0.869 total time= 6.3s [CV 5/5] END colsample_bytree=0.55, subsample=0.85;, score=0.867 total time= 6.3s [CV 1/5] END colsample_bytree=0.55, subsample=0.9;, score=0.864 total time= 6.3s [CV 2/5] END colsample_bytree=0.55, subsample=0.9;, score=0.861 total time= 6.2s [CV 3/5] END colsample_bytree=0.55, subsample=0.9;, score=0.871 total time= 6.2s [CV 4/5] END colsample_bytree=0.55, subsample=0.9;, score=0.868 total time= 6.3s [CV 5/5] END colsample_bytree=0.55, subsample=0.9;, score=0.868 total time= 6.3s [CV 1/5] END colsample_bytree=0.55, subsample=0.95;, score=0.862 total time= 6.2s [CV 2/5] END colsample_bytree=0.55, subsample=0.95;, score=0.867 total time= 6.1s [CV 3/5] END colsample_bytree=0.55, subsample=0.95;, score=0.870 total time= 6.1s [CV 4/5] END colsample_bytree=0.55, subsample=0.95;, score=0.866 total time= 6.1s [CV 5/5] END colsample_bytree=0.55, subsample=0.95;, score=0.865 total time= 6.2s [CV 1/5] END colsample_bytree=0.6, subsample=0.5;, score=0.842 total time= 6.9s [CV 2/5] END colsample_bytree=0.6, subsample=0.5;, score=0.848 total time= 6.9s [CV 3/5] END colsample_bytree=0.6, subsample=0.5;, score=0.857 total time= 7.0s [CV 4/5] END colsample_bytree=0.6, subsample=0.5;, score=0.850 total time= 7.1s [CV 5/5] END colsample_bytree=0.6, subsample=0.5;, score=0.853 total time= 6.9s [CV 1/5] END colsample_bytree=0.6, subsample=0.55;, score=0.849 total time= 7.0s [CV 2/5] END colsample_bytree=0.6, subsample=0.55;, score=0.846 total time= 7.0s [CV 3/5] END colsample_bytree=0.6, subsample=0.55;, score=0.858 total time= 7.0s [CV 4/5] END colsample_bytree=0.6, subsample=0.55;, score=0.856 total time= 7.0s [CV 5/5] END colsample_bytree=0.6, subsample=0.55;, score=0.854 total time= 7.0s [CV 1/5] END colsample_bytree=0.6, subsample=0.6;, score=0.851 total time= 7.0s [CV 2/5] END colsample_bytree=0.6, subsample=0.6;, score=0.855 total time= 7.1s [CV 3/5] END colsample_bytree=0.6, subsample=0.6;, score=0.862 total time= 7.0s [CV 4/5] END colsample_bytree=0.6, subsample=0.6;, score=0.856 total time= 6.9s [CV 5/5] END colsample_bytree=0.6, subsample=0.6;, score=0.856 total time= 7.2s [CV 1/5] END colsample_bytree=0.6, subsample=0.65;, score=0.857 total time= 7.9s [CV 2/5] END colsample_bytree=0.6, subsample=0.65;, score=0.850 total time= 7.5s [CV 3/5] END colsample_bytree=0.6, subsample=0.65;, score=0.865 total time= 8.5s [CV 4/5] END colsample_bytree=0.6, subsample=0.65;, score=0.864 total time= 7.4s [CV 5/5] END colsample_bytree=0.6, subsample=0.65;, score=0.858 total time= 7.4s [CV 1/5] END colsample_bytree=0.6, subsample=0.7;, score=0.857 total time= 7.8s [CV 2/5] END colsample_bytree=0.6, subsample=0.7;, score=0.859 total time= 7.3s [CV 3/5] END colsample_bytree=0.6, subsample=0.7;, score=0.865 total time= 7.1s [CV 4/5] END colsample_bytree=0.6, subsample=0.7;, score=0.865 total time= 7.1s [CV 5/5] END colsample_bytree=0.6, subsample=0.7;, score=0.862 total time= 7.1s [CV 1/5] END colsample_bytree=0.6, subsample=0.75;, score=0.860 total time= 7.5s [CV 2/5] END colsample_bytree=0.6, subsample=0.75;, score=0.865 total time= 7.4s [CV 3/5] END colsample_bytree=0.6, subsample=0.75;, score=0.872 total time= 7.5s [CV 4/5] END colsample_bytree=0.6, subsample=0.75;, score=0.864 total time= 7.0s [CV 5/5] END colsample_bytree=0.6, subsample=0.75;, score=0.863 total time= 7.1s [CV 1/5] END colsample_bytree=0.6, subsample=0.8;, score=0.861 total time= 6.9s [CV 2/5] END colsample_bytree=0.6, subsample=0.8;, score=0.864 total time= 6.9s [CV 3/5] END colsample_bytree=0.6, subsample=0.8;, score=0.869 total time= 6.9s [CV 4/5] END colsample_bytree=0.6, subsample=0.8;, score=0.867 total time= 7.0s [CV 5/5] END colsample_bytree=0.6, subsample=0.8;, score=0.867 total time= 6.8s [CV 1/5] END colsample_bytree=0.6, subsample=0.85;, score=0.862 total time= 7.1s [CV 2/5] END colsample_bytree=0.6, subsample=0.85;, score=0.865 total time= 6.8s [CV 3/5] END colsample_bytree=0.6, subsample=0.85;, score=0.867 total time= 6.9s [CV 4/5] END colsample_bytree=0.6, subsample=0.85;, score=0.868 total time= 6.8s [CV 5/5] END colsample_bytree=0.6, subsample=0.85;, score=0.869 total time= 6.7s [CV 1/5] END colsample_bytree=0.6, subsample=0.9;, score=0.862 total time= 6.6s [CV 2/5] END colsample_bytree=0.6, subsample=0.9;, score=0.866 total time= 6.7s [CV 3/5] END colsample_bytree=0.6, subsample=0.9;, score=0.869 total time= 7.0s [CV 4/5] END colsample_bytree=0.6, subsample=0.9;, score=0.870 total time= 6.9s [CV 5/5] END colsample_bytree=0.6, subsample=0.9;, score=0.868 total time= 6.7s [CV 1/5] END colsample_bytree=0.6, subsample=0.95;, score=0.866 total time= 6.6s [CV 2/5] END colsample_bytree=0.6, subsample=0.95;, score=0.866 total time= 6.7s [CV 3/5] END colsample_bytree=0.6, subsample=0.95;, score=0.870 total time= 6.6s [CV 4/5] END colsample_bytree=0.6, subsample=0.95;, score=0.868 total time= 6.6s [CV 5/5] END colsample_bytree=0.6, subsample=0.95;, score=0.870 total time= 6.6s [CV 1/5] END colsample_bytree=0.65, subsample=0.5;, score=0.848 total time= 7.4s [CV 2/5] END colsample_bytree=0.65, subsample=0.5;, score=0.842 total time= 7.8s [CV 3/5] END colsample_bytree=0.65, subsample=0.5;, score=0.853 total time= 7.3s [CV 4/5] END colsample_bytree=0.65, subsample=0.5;, score=0.852 total time= 7.3s [CV 5/5] END colsample_bytree=0.65, subsample=0.5;, score=0.851 total time= 7.5s [CV 1/5] END colsample_bytree=0.65, subsample=0.55;, score=0.854 total time= 8.0s [CV 2/5] END colsample_bytree=0.65, subsample=0.55;, score=0.849 total time= 7.4s [CV 3/5] END colsample_bytree=0.65, subsample=0.55;, score=0.860 total time= 7.5s [CV 4/5] END colsample_bytree=0.65, subsample=0.55;, score=0.855 total time= 7.4s [CV 5/5] END colsample_bytree=0.65, subsample=0.55;, score=0.850 total time= 7.5s [CV 1/5] END colsample_bytree=0.65, subsample=0.6;, score=0.855 total time= 7.4s [CV 2/5] END colsample_bytree=0.65, subsample=0.6;, score=0.854 total time= 7.9s [CV 3/5] END colsample_bytree=0.65, subsample=0.6;, score=0.861 total time= 7.8s [CV 4/5] END colsample_bytree=0.65, subsample=0.6;, score=0.860 total time= 7.5s [CV 5/5] END colsample_bytree=0.65, subsample=0.6;, score=0.854 total time= 7.4s [CV 1/5] END colsample_bytree=0.65, subsample=0.65;, score=0.858 total time= 7.3s [CV 2/5] END colsample_bytree=0.65, subsample=0.65;, score=0.859 total time= 7.3s [CV 3/5] END colsample_bytree=0.65, subsample=0.65;, score=0.864 total time= 7.6s [CV 4/5] END colsample_bytree=0.65, subsample=0.65;, score=0.861 total time= 7.4s [CV 5/5] END colsample_bytree=0.65, subsample=0.65;, score=0.858 total time= 7.5s [CV 1/5] END colsample_bytree=0.65, subsample=0.7;, score=0.859 total time= 7.3s [CV 2/5] END colsample_bytree=0.65, subsample=0.7;, score=0.858 total time= 7.4s [CV 3/5] END colsample_bytree=0.65, subsample=0.7;, score=0.868 total time= 7.2s [CV 4/5] END colsample_bytree=0.65, subsample=0.7;, score=0.867 total time= 7.5s [CV 5/5] END colsample_bytree=0.65, subsample=0.7;, score=0.864 total time= 7.3s [CV 1/5] END colsample_bytree=0.65, subsample=0.75;, score=0.862 total time= 7.3s [CV 2/5] END colsample_bytree=0.65, subsample=0.75;, score=0.862 total time= 7.2s [CV 3/5] END colsample_bytree=0.65, subsample=0.75;, score=0.867 total time= 7.3s [CV 4/5] END colsample_bytree=0.65, subsample=0.75;, score=0.867 total time= 7.2s [CV 5/5] END colsample_bytree=0.65, subsample=0.75;, score=0.860 total time= 7.4s [CV 1/5] END colsample_bytree=0.65, subsample=0.8;, score=0.860 total time= 7.1s [CV 2/5] END colsample_bytree=0.65, subsample=0.8;, score=0.864 total time= 7.0s [CV 3/5] END colsample_bytree=0.65, subsample=0.8;, score=0.870 total time= 7.0s [CV 4/5] END colsample_bytree=0.65, subsample=0.8;, score=0.869 total time= 7.2s [CV 5/5] END colsample_bytree=0.65, subsample=0.8;, score=0.865 total time= 7.1s [CV 1/5] END colsample_bytree=0.65, subsample=0.85;, score=0.863 total time= 7.0s [CV 2/5] END colsample_bytree=0.65, subsample=0.85;, score=0.865 total time= 7.0s [CV 3/5] END colsample_bytree=0.65, subsample=0.85;, score=0.868 total time= 7.1s [CV 4/5] END colsample_bytree=0.65, subsample=0.85;, score=0.867 total time= 7.0s [CV 5/5] END colsample_bytree=0.65, subsample=0.85;, score=0.869 total time= 7.0s [CV 1/5] END colsample_bytree=0.65, subsample=0.9;, score=0.868 total time= 6.9s [CV 2/5] END colsample_bytree=0.65, subsample=0.9;, score=0.868 total time= 6.9s [CV 3/5] END colsample_bytree=0.65, subsample=0.9;, score=0.868 total time= 7.0s [CV 4/5] END colsample_bytree=0.65, subsample=0.9;, score=0.872 total time= 6.9s [CV 5/5] END colsample_bytree=0.65, subsample=0.9;, score=0.866 total time= 6.9s [CV 1/5] END colsample_bytree=0.65, subsample=0.95;, score=0.865 total time= 6.7s [CV 2/5] END colsample_bytree=0.65, subsample=0.95;, score=0.867 total time= 7.0s [CV 3/5] END colsample_bytree=0.65, subsample=0.95;, score=0.870 total time= 6.8s [CV 4/5] END colsample_bytree=0.65, subsample=0.95;, score=0.866 total time= 6.8s [CV 5/5] END colsample_bytree=0.65, subsample=0.95;, score=0.866 total time= 6.7s [CV 1/5] END colsample_bytree=0.7, subsample=0.5;, score=0.849 total time= 7.8s [CV 2/5] END colsample_bytree=0.7, subsample=0.5;, score=0.847 total time= 8.2s [CV 3/5] END colsample_bytree=0.7, subsample=0.5;, score=0.856 total time= 7.7s [CV 4/5] END colsample_bytree=0.7, subsample=0.5;, score=0.853 total time= 7.7s [CV 5/5] END colsample_bytree=0.7, subsample=0.5;, score=0.852 total time= 7.6s [CV 1/5] END colsample_bytree=0.7, subsample=0.55;, score=0.853 total time= 7.7s [CV 2/5] END colsample_bytree=0.7, subsample=0.55;, score=0.853 total time= 7.8s [CV 3/5] END colsample_bytree=0.7, subsample=0.55;, score=0.859 total time= 7.8s [CV 4/5] END colsample_bytree=0.7, subsample=0.55;, score=0.859 total time= 7.9s [CV 5/5] END colsample_bytree=0.7, subsample=0.55;, score=0.852 total time= 7.8s [CV 1/5] END colsample_bytree=0.7, subsample=0.6;, score=0.854 total time= 7.7s [CV 2/5] END colsample_bytree=0.7, subsample=0.6;, score=0.855 total time= 7.7s [CV 3/5] END colsample_bytree=0.7, subsample=0.6;, score=0.859 total time= 7.8s [CV 4/5] END colsample_bytree=0.7, subsample=0.6;, score=0.857 total time= 7.7s [CV 5/5] END colsample_bytree=0.7, subsample=0.6;, score=0.854 total time= 7.7s [CV 1/5] END colsample_bytree=0.7, subsample=0.65;, score=0.860 total time= 7.7s [CV 2/5] END colsample_bytree=0.7, subsample=0.65;, score=0.862 total time= 7.8s [CV 3/5] END colsample_bytree=0.7, subsample=0.65;, score=0.866 total time= 7.5s [CV 4/5] END colsample_bytree=0.7, subsample=0.65;, score=0.861 total time= 7.6s [CV 5/5] END colsample_bytree=0.7, subsample=0.65;, score=0.859 total time= 7.7s [CV 1/5] END colsample_bytree=0.7, subsample=0.7;, score=0.863 total time= 7.7s [CV 2/5] END colsample_bytree=0.7, subsample=0.7;, score=0.861 total time= 7.5s [CV 3/5] END colsample_bytree=0.7, subsample=0.7;, score=0.868 total time= 7.5s [CV 4/5] END colsample_bytree=0.7, subsample=0.7;, score=0.867 total time= 7.5s [CV 5/5] END colsample_bytree=0.7, subsample=0.7;, score=0.860 total time= 7.7s [CV 1/5] END colsample_bytree=0.7, subsample=0.75;, score=0.864 total time= 7.4s [CV 2/5] END colsample_bytree=0.7, subsample=0.75;, score=0.862 total time= 7.5s [CV 3/5] END colsample_bytree=0.7, subsample=0.75;, score=0.867 total time= 7.5s [CV 4/5] END colsample_bytree=0.7, subsample=0.75;, score=0.863 total time= 7.5s [CV 5/5] END colsample_bytree=0.7, subsample=0.75;, score=0.861 total time= 7.4s [CV 1/5] END colsample_bytree=0.7, subsample=0.8;, score=0.865 total time= 7.4s [CV 2/5] END colsample_bytree=0.7, subsample=0.8;, score=0.861 total time= 7.3s [CV 3/5] END colsample_bytree=0.7, subsample=0.8;, score=0.870 total time= 7.4s [CV 4/5] END colsample_bytree=0.7, subsample=0.8;, score=0.867 total time= 7.3s [CV 5/5] END colsample_bytree=0.7, subsample=0.8;, score=0.864 total time= 7.3s [CV 1/5] END colsample_bytree=0.7, subsample=0.85;, score=0.864 total time= 7.3s [CV 2/5] END colsample_bytree=0.7, subsample=0.85;, score=0.865 total time= 7.4s [CV 3/5] END colsample_bytree=0.7, subsample=0.85;, score=0.871 total time= 7.3s [CV 4/5] END colsample_bytree=0.7, subsample=0.85;, score=0.871 total time= 7.3s [CV 5/5] END colsample_bytree=0.7, subsample=0.85;, score=0.867 total time= 7.3s [CV 1/5] END colsample_bytree=0.7, subsample=0.9;, score=0.866 total time= 7.2s [CV 2/5] END colsample_bytree=0.7, subsample=0.9;, score=0.866 total time= 7.1s [CV 3/5] END colsample_bytree=0.7, subsample=0.9;, score=0.872 total time= 7.2s [CV 4/5] END colsample_bytree=0.7, subsample=0.9;, score=0.869 total time= 7.1s [CV 5/5] END colsample_bytree=0.7, subsample=0.9;, score=0.866 total time= 7.3s [CV 1/5] END colsample_bytree=0.7, subsample=0.95;, score=0.867 total time= 7.0s [CV 2/5] END colsample_bytree=0.7, subsample=0.95;, score=0.869 total time= 6.9s [CV 3/5] END colsample_bytree=0.7, subsample=0.95;, score=0.868 total time= 7.0s [CV 4/5] END colsample_bytree=0.7, subsample=0.95;, score=0.871 total time= 7.2s [CV 5/5] END colsample_bytree=0.7, subsample=0.95;, score=0.869 total time= 7.0s [CV 1/5] END colsample_bytree=0.75, subsample=0.5;, score=0.852 total time= 7.9s [CV 2/5] END colsample_bytree=0.75, subsample=0.5;, score=0.852 total time= 8.1s [CV 3/5] END colsample_bytree=0.75, subsample=0.5;, score=0.858 total time= 8.1s [CV 4/5] END colsample_bytree=0.75, subsample=0.5;, score=0.855 total time= 7.9s [CV 5/5] END colsample_bytree=0.75, subsample=0.5;, score=0.850 total time= 8.0s [CV 1/5] END colsample_bytree=0.75, subsample=0.55;, score=0.855 total time= 8.1s [CV 2/5] END colsample_bytree=0.75, subsample=0.55;, score=0.854 total time= 8.3s [CV 3/5] END colsample_bytree=0.75, subsample=0.55;, score=0.860 total time= 8.1s [CV 4/5] END colsample_bytree=0.75, subsample=0.55;, score=0.860 total time= 8.1s [CV 5/5] END colsample_bytree=0.75, subsample=0.55;, score=0.853 total time= 8.0s [CV 1/5] END colsample_bytree=0.75, subsample=0.6;, score=0.856 total time= 8.2s [CV 2/5] END colsample_bytree=0.75, subsample=0.6;, score=0.854 total time= 8.0s [CV 3/5] END colsample_bytree=0.75, subsample=0.6;, score=0.862 total time= 8.0s [CV 4/5] END colsample_bytree=0.75, subsample=0.6;, score=0.861 total time= 8.1s [CV 5/5] END colsample_bytree=0.75, subsample=0.6;, score=0.854 total time= 8.0s [CV 1/5] END colsample_bytree=0.75, subsample=0.65;, score=0.864 total time= 8.0s [CV 2/5] END colsample_bytree=0.75, subsample=0.65;, score=0.859 total time= 8.0s [CV 3/5] END colsample_bytree=0.75, subsample=0.65;, score=0.858 total time= 8.0s [CV 4/5] END colsample_bytree=0.75, subsample=0.65;, score=0.862 total time= 7.9s [CV 5/5] END colsample_bytree=0.75, subsample=0.65;, score=0.860 total time= 7.9s [CV 1/5] END colsample_bytree=0.75, subsample=0.7;, score=0.862 total time= 7.8s [CV 2/5] END colsample_bytree=0.75, subsample=0.7;, score=0.863 total time= 8.0s [CV 3/5] END colsample_bytree=0.75, subsample=0.7;, score=0.864 total time= 7.9s [CV 4/5] END colsample_bytree=0.75, subsample=0.7;, score=0.864 total time= 7.9s [CV 5/5] END colsample_bytree=0.75, subsample=0.7;, score=0.861 total time= 7.8s [CV 1/5] END colsample_bytree=0.75, subsample=0.75;, score=0.859 total time= 7.9s [CV 2/5] END colsample_bytree=0.75, subsample=0.75;, score=0.865 total time= 7.6s [CV 3/5] END colsample_bytree=0.75, subsample=0.75;, score=0.866 total time= 7.7s [CV 4/5] END colsample_bytree=0.75, subsample=0.75;, score=0.865 total time= 7.7s [CV 5/5] END colsample_bytree=0.75, subsample=0.75;, score=0.863 total time= 8.0s [CV 1/5] END colsample_bytree=0.75, subsample=0.8;, score=0.864 total time= 7.6s [CV 2/5] END colsample_bytree=0.75, subsample=0.8;, score=0.869 total time= 7.6s [CV 3/5] END colsample_bytree=0.75, subsample=0.8;, score=0.868 total time= 7.7s [CV 4/5] END colsample_bytree=0.75, subsample=0.8;, score=0.870 total time= 7.8s [CV 5/5] END colsample_bytree=0.75, subsample=0.8;, score=0.869 total time= 7.6s [CV 1/5] END colsample_bytree=0.75, subsample=0.85;, score=0.867 total time= 7.5s [CV 2/5] END colsample_bytree=0.75, subsample=0.85;, score=0.869 total time= 7.5s [CV 3/5] END colsample_bytree=0.75, subsample=0.85;, score=0.871 total time= 7.6s [CV 4/5] END colsample_bytree=0.75, subsample=0.85;, score=0.870 total time= 7.5s [CV 5/5] END colsample_bytree=0.75, subsample=0.85;, score=0.870 total time= 7.6s [CV 1/5] END colsample_bytree=0.75, subsample=0.9;, score=0.869 total time= 7.6s [CV 2/5] END colsample_bytree=0.75, subsample=0.9;, score=0.869 total time= 7.5s [CV 3/5] END colsample_bytree=0.75, subsample=0.9;, score=0.868 total time= 7.4s [CV 4/5] END colsample_bytree=0.75, subsample=0.9;, score=0.870 total time= 7.4s [CV 5/5] END colsample_bytree=0.75, subsample=0.9;, score=0.873 total time= 7.4s [CV 1/5] END colsample_bytree=0.75, subsample=0.95;, score=0.868 total time= 7.5s [CV 2/5] END colsample_bytree=0.75, subsample=0.95;, score=0.871 total time= 7.5s [CV 3/5] END colsample_bytree=0.75, subsample=0.95;, score=0.872 total time= 7.8s [CV 4/5] END colsample_bytree=0.75, subsample=0.95;, score=0.873 total time= 7.5s [CV 5/5] END colsample_bytree=0.75, subsample=0.95;, score=0.868 total time= 7.4s [CV 1/5] END colsample_bytree=0.8, subsample=0.5;, score=0.851 total time= 8.7s [CV 2/5] END colsample_bytree=0.8, subsample=0.5;, score=0.849 total time= 8.4s [CV 3/5] END colsample_bytree=0.8, subsample=0.5;, score=0.854 total time= 8.7s [CV 4/5] END colsample_bytree=0.8, subsample=0.5;, score=0.855 total time= 8.6s [CV 5/5] END colsample_bytree=0.8, subsample=0.5;, score=0.851 total time= 8.4s [CV 1/5] END colsample_bytree=0.8, subsample=0.55;, score=0.855 total time= 7.9s [CV 2/5] END colsample_bytree=0.8, subsample=0.55;, score=0.853 total time= 7.9s [CV 3/5] END colsample_bytree=0.8, subsample=0.55;, score=0.855 total time= 7.9s [CV 4/5] END colsample_bytree=0.8, subsample=0.55;, score=0.858 total time= 7.9s [CV 5/5] END colsample_bytree=0.8, subsample=0.55;, score=0.848 total time= 7.9s [CV 1/5] END colsample_bytree=0.8, subsample=0.6;, score=0.856 total time= 7.8s [CV 2/5] END colsample_bytree=0.8, subsample=0.6;, score=0.856 total time= 7.9s [CV 3/5] END colsample_bytree=0.8, subsample=0.6;, score=0.860 total time= 8.0s [CV 4/5] END colsample_bytree=0.8, subsample=0.6;, score=0.862 total time= 7.8s [CV 5/5] END colsample_bytree=0.8, subsample=0.6;, score=0.855 total time= 7.9s [CV 1/5] END colsample_bytree=0.8, subsample=0.65;, score=0.856 total time= 7.8s [CV 2/5] END colsample_bytree=0.8, subsample=0.65;, score=0.856 total time= 7.8s [CV 3/5] END colsample_bytree=0.8, subsample=0.65;, score=0.860 total time= 7.8s [CV 4/5] END colsample_bytree=0.8, subsample=0.65;, score=0.861 total time= 7.8s [CV 5/5] END colsample_bytree=0.8, subsample=0.65;, score=0.854 total time= 7.9s [CV 1/5] END colsample_bytree=0.8, subsample=0.7;, score=0.862 total time= 7.7s [CV 2/5] END colsample_bytree=0.8, subsample=0.7;, score=0.863 total time= 7.7s [CV 3/5] END colsample_bytree=0.8, subsample=0.7;, score=0.863 total time= 7.7s [CV 4/5] END colsample_bytree=0.8, subsample=0.7;, score=0.864 total time= 7.7s [CV 5/5] END colsample_bytree=0.8, subsample=0.7;, score=0.863 total time= 7.7s [CV 1/5] END colsample_bytree=0.8, subsample=0.75;, score=0.863 total time= 8.0s [CV 2/5] END colsample_bytree=0.8, subsample=0.75;, score=0.863 total time= 8.4s [CV 3/5] END colsample_bytree=0.8, subsample=0.75;, score=0.863 total time= 7.9s [CV 4/5] END colsample_bytree=0.8, subsample=0.75;, score=0.865 total time= 7.6s [CV 5/5] END colsample_bytree=0.8, subsample=0.75;, score=0.862 total time= 7.6s [CV 1/5] END colsample_bytree=0.8, subsample=0.8;, score=0.864 total time= 7.6s [CV 2/5] END colsample_bytree=0.8, subsample=0.8;, score=0.863 total time= 7.5s [CV 3/5] END colsample_bytree=0.8, subsample=0.8;, score=0.871 total time= 7.5s [CV 4/5] END colsample_bytree=0.8, subsample=0.8;, score=0.868 total time= 7.5s [CV 5/5] END colsample_bytree=0.8, subsample=0.8;, score=0.865 total time= 7.6s [CV 1/5] END colsample_bytree=0.8, subsample=0.85;, score=0.868 total time= 7.5s [CV 2/5] END colsample_bytree=0.8, subsample=0.85;, score=0.866 total time= 7.3s [CV 3/5] END colsample_bytree=0.8, subsample=0.85;, score=0.873 total time= 7.3s [CV 4/5] END colsample_bytree=0.8, subsample=0.85;, score=0.871 total time= 7.5s [CV 5/5] END colsample_bytree=0.8, subsample=0.85;, score=0.868 total time= 7.4s [CV 1/5] END colsample_bytree=0.8, subsample=0.9;, score=0.868 total time= 7.2s [CV 2/5] END colsample_bytree=0.8, subsample=0.9;, score=0.870 total time= 7.3s [CV 3/5] END colsample_bytree=0.8, subsample=0.9;, score=0.870 total time= 7.4s [CV 4/5] END colsample_bytree=0.8, subsample=0.9;, score=0.870 total time= 7.2s [CV 5/5] END colsample_bytree=0.8, subsample=0.9;, score=0.868 total time= 7.3s [CV 1/5] END colsample_bytree=0.8, subsample=0.95;, score=0.869 total time= 7.2s [CV 2/5] END colsample_bytree=0.8, subsample=0.95;, score=0.868 total time= 7.1s [CV 3/5] END colsample_bytree=0.8, subsample=0.95;, score=0.873 total time= 7.8s [CV 4/5] END colsample_bytree=0.8, subsample=0.95;, score=0.870 total time= 7.1s [CV 5/5] END colsample_bytree=0.8, subsample=0.95;, score=0.867 total time= 7.1s [CV 1/5] END colsample_bytree=0.85, subsample=0.5;, score=0.852 total time= 8.0s [CV 2/5] END colsample_bytree=0.85, subsample=0.5;, score=0.854 total time= 8.3s [CV 3/5] END colsample_bytree=0.85, subsample=0.5;, score=0.852 total time= 8.0s [CV 4/5] END colsample_bytree=0.85, subsample=0.5;, score=0.850 total time= 7.9s [CV 5/5] END colsample_bytree=0.85, subsample=0.5;, score=0.852 total time= 8.4s [CV 1/5] END colsample_bytree=0.85, subsample=0.55;, score=0.854 total time= 8.1s [CV 2/5] END colsample_bytree=0.85, subsample=0.55;, score=0.851 total time= 8.1s [CV 3/5] END colsample_bytree=0.85, subsample=0.55;, score=0.856 total time= 8.8s [CV 4/5] END colsample_bytree=0.85, subsample=0.55;, score=0.855 total time= 8.9s [CV 5/5] END colsample_bytree=0.85, subsample=0.55;, score=0.851 total time= 8.7s [CV 1/5] END colsample_bytree=0.85, subsample=0.6;, score=0.854 total time= 8.6s [CV 2/5] END colsample_bytree=0.85, subsample=0.6;, score=0.861 total time= 8.7s [CV 3/5] END colsample_bytree=0.85, subsample=0.6;, score=0.862 total time= 8.7s [CV 4/5] END colsample_bytree=0.85, subsample=0.6;, score=0.861 total time= 8.6s [CV 5/5] END colsample_bytree=0.85, subsample=0.6;, score=0.856 total time= 8.6s [CV 1/5] END colsample_bytree=0.85, subsample=0.65;, score=0.859 total time= 8.7s [CV 2/5] END colsample_bytree=0.85, subsample=0.65;, score=0.858 total time= 8.5s [CV 3/5] END colsample_bytree=0.85, subsample=0.65;, score=0.862 total time= 8.5s [CV 4/5] END colsample_bytree=0.85, subsample=0.65;, score=0.864 total time= 9.2s [CV 5/5] END colsample_bytree=0.85, subsample=0.65;, score=0.860 total time= 8.6s [CV 1/5] END colsample_bytree=0.85, subsample=0.7;, score=0.864 total time= 8.5s [CV 2/5] END colsample_bytree=0.85, subsample=0.7;, score=0.861 total time= 8.3s [CV 3/5] END colsample_bytree=0.85, subsample=0.7;, score=0.867 total time= 8.6s [CV 4/5] END colsample_bytree=0.85, subsample=0.7;, score=0.865 total time= 8.5s [CV 5/5] END colsample_bytree=0.85, subsample=0.7;, score=0.861 total time= 8.4s [CV 1/5] END colsample_bytree=0.85, subsample=0.75;, score=0.864 total time= 8.3s [CV 2/5] END colsample_bytree=0.85, subsample=0.75;, score=0.867 total time= 8.3s [CV 3/5] END colsample_bytree=0.85, subsample=0.75;, score=0.867 total time= 8.2s [CV 4/5] END colsample_bytree=0.85, subsample=0.75;, score=0.868 total time= 8.3s [CV 5/5] END colsample_bytree=0.85, subsample=0.75;, score=0.863 total time= 8.4s [CV 1/5] END colsample_bytree=0.85, subsample=0.8;, score=0.864 total time= 8.1s [CV 2/5] END colsample_bytree=0.85, subsample=0.8;, score=0.865 total time= 8.1s [CV 3/5] END colsample_bytree=0.85, subsample=0.8;, score=0.871 total time= 8.1s [CV 4/5] END colsample_bytree=0.85, subsample=0.8;, score=0.872 total time= 8.2s [CV 5/5] END colsample_bytree=0.85, subsample=0.8;, score=0.866 total time= 8.1s [CV 1/5] END colsample_bytree=0.85, subsample=0.85;, score=0.866 total time= 8.0s [CV 2/5] END colsample_bytree=0.85, subsample=0.85;, score=0.867 total time= 8.1s [CV 3/5] END colsample_bytree=0.85, subsample=0.85;, score=0.867 total time= 8.0s [CV 4/5] END colsample_bytree=0.85, subsample=0.85;, score=0.867 total time= 8.2s [CV 5/5] END colsample_bytree=0.85, subsample=0.85;, score=0.866 total time= 7.9s [CV 1/5] END colsample_bytree=0.85, subsample=0.9;, score=0.869 total time= 7.9s [CV 2/5] END colsample_bytree=0.85, subsample=0.9;, score=0.870 total time= 7.9s [CV 3/5] END colsample_bytree=0.85, subsample=0.9;, score=0.870 total time= 7.8s [CV 4/5] END colsample_bytree=0.85, subsample=0.9;, score=0.870 total time= 7.8s [CV 5/5] END colsample_bytree=0.85, subsample=0.9;, score=0.866 total time= 8.1s [CV 1/5] END colsample_bytree=0.85, subsample=0.95;, score=0.870 total time= 7.7s [CV 2/5] END colsample_bytree=0.85, subsample=0.95;, score=0.871 total time= 7.8s [CV 3/5] END colsample_bytree=0.85, subsample=0.95;, score=0.873 total time= 7.6s [CV 4/5] END colsample_bytree=0.85, subsample=0.95;, score=0.873 total time= 7.8s [CV 5/5] END colsample_bytree=0.85, subsample=0.95;, score=0.871 total time= 7.8s [CV 1/5] END colsample_bytree=0.9, subsample=0.5;, score=0.852 total time= 8.7s [CV 2/5] END colsample_bytree=0.9, subsample=0.5;, score=0.857 total time= 8.8s [CV 3/5] END colsample_bytree=0.9, subsample=0.5;, score=0.856 total time= 8.8s [CV 4/5] END colsample_bytree=0.9, subsample=0.5;, score=0.852 total time= 8.8s [CV 5/5] END colsample_bytree=0.9, subsample=0.5;, score=0.851 total time= 8.7s [CV 1/5] END colsample_bytree=0.9, subsample=0.55;, score=0.861 total time= 9.1s [CV 2/5] END colsample_bytree=0.9, subsample=0.55;, score=0.857 total time= 8.9s [CV 3/5] END colsample_bytree=0.9, subsample=0.55;, score=0.859 total time= 9.1s [CV 4/5] END colsample_bytree=0.9, subsample=0.55;, score=0.854 total time= 9.0s [CV 5/5] END colsample_bytree=0.9, subsample=0.55;, score=0.857 total time= 9.0s [CV 1/5] END colsample_bytree=0.9, subsample=0.6;, score=0.857 total time= 9.0s [CV 2/5] END colsample_bytree=0.9, subsample=0.6;, score=0.860 total time= 8.9s [CV 3/5] END colsample_bytree=0.9, subsample=0.6;, score=0.861 total time= 9.0s [CV 4/5] END colsample_bytree=0.9, subsample=0.6;, score=0.865 total time= 8.9s [CV 5/5] END colsample_bytree=0.9, subsample=0.6;, score=0.855 total time= 8.8s [CV 1/5] END colsample_bytree=0.9, subsample=0.65;, score=0.864 total time= 8.9s [CV 2/5] END colsample_bytree=0.9, subsample=0.65;, score=0.861 total time= 8.9s [CV 3/5] END colsample_bytree=0.9, subsample=0.65;, score=0.861 total time= 8.8s [CV 4/5] END colsample_bytree=0.9, subsample=0.65;, score=0.865 total time= 9.2s [CV 5/5] END colsample_bytree=0.9, subsample=0.65;, score=0.860 total time= 8.8s [CV 1/5] END colsample_bytree=0.9, subsample=0.7;, score=0.865 total time= 8.6s [CV 2/5] END colsample_bytree=0.9, subsample=0.7;, score=0.863 total time= 9.2s [CV 3/5] END colsample_bytree=0.9, subsample=0.7;, score=0.865 total time= 9.1s [CV 4/5] END colsample_bytree=0.9, subsample=0.7;, score=0.868 total time= 8.7s [CV 5/5] END colsample_bytree=0.9, subsample=0.7;, score=0.859 total time= 8.9s [CV 1/5] END colsample_bytree=0.9, subsample=0.75;, score=0.866 total time= 9.0s [CV 2/5] END colsample_bytree=0.9, subsample=0.75;, score=0.870 total time= 8.7s [CV 3/5] END colsample_bytree=0.9, subsample=0.75;, score=0.865 total time= 8.5s [CV 4/5] END colsample_bytree=0.9, subsample=0.75;, score=0.869 total time= 8.6s [CV 5/5] END colsample_bytree=0.9, subsample=0.75;, score=0.865 total time= 8.6s [CV 1/5] END colsample_bytree=0.9, subsample=0.8;, score=0.864 total time= 8.5s [CV 2/5] END colsample_bytree=0.9, subsample=0.8;, score=0.866 total time= 8.5s [CV 3/5] END colsample_bytree=0.9, subsample=0.8;, score=0.866 total time= 8.5s [CV 4/5] END colsample_bytree=0.9, subsample=0.8;, score=0.870 total time= 8.5s [CV 5/5] END colsample_bytree=0.9, subsample=0.8;, score=0.867 total time= 8.4s [CV 1/5] END colsample_bytree=0.9, subsample=0.85;, score=0.868 total time= 8.3s [CV 2/5] END colsample_bytree=0.9, subsample=0.85;, score=0.869 total time= 8.5s [CV 3/5] END colsample_bytree=0.9, subsample=0.85;, score=0.871 total time= 8.3s [CV 4/5] END colsample_bytree=0.9, subsample=0.85;, score=0.868 total time= 8.2s [CV 5/5] END colsample_bytree=0.9, subsample=0.85;, score=0.868 total time= 8.4s [CV 1/5] END colsample_bytree=0.9, subsample=0.9;, score=0.868 total time= 8.1s [CV 2/5] END colsample_bytree=0.9, subsample=0.9;, score=0.868 total time= 8.1s [CV 3/5] END colsample_bytree=0.9, subsample=0.9;, score=0.871 total time= 8.2s [CV 4/5] END colsample_bytree=0.9, subsample=0.9;, score=0.872 total time= 8.3s [CV 5/5] END colsample_bytree=0.9, subsample=0.9;, score=0.868 total time= 8.1s [CV 1/5] END colsample_bytree=0.9, subsample=0.95;, score=0.868 total time= 8.0s [CV 2/5] END colsample_bytree=0.9, subsample=0.95;, score=0.867 total time= 8.1s [CV 3/5] END colsample_bytree=0.9, subsample=0.95;, score=0.873 total time= 8.1s [CV 4/5] END colsample_bytree=0.9, subsample=0.95;, score=0.872 total time= 8.0s [CV 5/5] END colsample_bytree=0.9, subsample=0.95;, score=0.869 total time= 8.0s [CV 1/5] END colsample_bytree=0.95, subsample=0.5;, score=0.855 total time= 9.2s [CV 2/5] END colsample_bytree=0.95, subsample=0.5;, score=0.854 total time= 9.3s [CV 3/5] END colsample_bytree=0.95, subsample=0.5;, score=0.856 total time= 9.1s [CV 4/5] END colsample_bytree=0.95, subsample=0.5;, score=0.855 total time= 9.3s [CV 5/5] END colsample_bytree=0.95, subsample=0.5;, score=0.853 total time= 9.1s [CV 1/5] END colsample_bytree=0.95, subsample=0.55;, score=0.858 total time= 9.2s [CV 2/5] END colsample_bytree=0.95, subsample=0.55;, score=0.857 total time= 9.1s [CV 3/5] END colsample_bytree=0.95, subsample=0.55;, score=0.858 total time= 9.3s [CV 4/5] END colsample_bytree=0.95, subsample=0.55;, score=0.858 total time= 9.4s [CV 5/5] END colsample_bytree=0.95, subsample=0.55;, score=0.851 total time= 9.4s [CV 1/5] END colsample_bytree=0.95, subsample=0.6;, score=0.860 total time= 9.4s [CV 2/5] END colsample_bytree=0.95, subsample=0.6;, score=0.860 total time= 9.0s [CV 3/5] END colsample_bytree=0.95, subsample=0.6;, score=0.858 total time= 9.1s [CV 4/5] END colsample_bytree=0.95, subsample=0.6;, score=0.861 total time= 9.3s [CV 5/5] END colsample_bytree=0.95, subsample=0.6;, score=0.857 total time= 9.7s [CV 1/5] END colsample_bytree=0.95, subsample=0.65;, score=0.861 total time= 9.0s [CV 2/5] END colsample_bytree=0.95, subsample=0.65;, score=0.862 total time= 9.2s [CV 3/5] END colsample_bytree=0.95, subsample=0.65;, score=0.861 total time= 8.9s [CV 4/5] END colsample_bytree=0.95, subsample=0.65;, score=0.863 total time= 9.3s [CV 5/5] END colsample_bytree=0.95, subsample=0.65;, score=0.859 total time= 9.2s [CV 1/5] END colsample_bytree=0.95, subsample=0.7;, score=0.866 total time= 9.2s [CV 2/5] END colsample_bytree=0.95, subsample=0.7;, score=0.866 total time= 9.0s [CV 3/5] END colsample_bytree=0.95, subsample=0.7;, score=0.862 total time= 9.0s [CV 4/5] END colsample_bytree=0.95, subsample=0.7;, score=0.868 total time= 9.1s [CV 5/5] END colsample_bytree=0.95, subsample=0.7;, score=0.862 total time= 9.0s [CV 1/5] END colsample_bytree=0.95, subsample=0.75;, score=0.863 total time= 8.7s [CV 2/5] END colsample_bytree=0.95, subsample=0.75;, score=0.864 total time= 8.8s [CV 3/5] END colsample_bytree=0.95, subsample=0.75;, score=0.874 total time= 9.1s [CV 4/5] END colsample_bytree=0.95, subsample=0.75;, score=0.871 total time= 8.9s [CV 5/5] END colsample_bytree=0.95, subsample=0.75;, score=0.868 total time= 8.8s [CV 1/5] END colsample_bytree=0.95, subsample=0.8;, score=0.867 total time= 8.8s [CV 2/5] END colsample_bytree=0.95, subsample=0.8;, score=0.870 total time= 8.7s [CV 3/5] END colsample_bytree=0.95, subsample=0.8;, score=0.871 total time= 8.7s [CV 4/5] END colsample_bytree=0.95, subsample=0.8;, score=0.871 total time= 8.7s [CV 5/5] END colsample_bytree=0.95, subsample=0.8;, score=0.869 total time= 8.7s [CV 1/5] END colsample_bytree=0.95, subsample=0.85;, score=0.872 total time= 8.5s [CV 2/5] END colsample_bytree=0.95, subsample=0.85;, score=0.873 total time= 8.5s [CV 3/5] END colsample_bytree=0.95, subsample=0.85;, score=0.870 total time= 8.8s [CV 4/5] END colsample_bytree=0.95, subsample=0.85;, score=0.869 total time= 8.6s [CV 5/5] END colsample_bytree=0.95, subsample=0.85;, score=0.868 total time= 8.5s [CV 1/5] END colsample_bytree=0.95, subsample=0.9;, score=0.871 total time= 8.5s [CV 2/5] END colsample_bytree=0.95, subsample=0.9;, score=0.873 total time= 8.7s [CV 3/5] END colsample_bytree=0.95, subsample=0.9;, score=0.871 total time= 8.3s [CV 4/5] END colsample_bytree=0.95, subsample=0.9;, score=0.870 total time= 8.4s [CV 5/5] END colsample_bytree=0.95, subsample=0.9;, score=0.869 total time= 8.5s [CV 1/5] END colsample_bytree=0.95, subsample=0.95;, score=0.867 total time= 8.3s [CV 2/5] END colsample_bytree=0.95, subsample=0.95;, score=0.870 total time= 8.3s [CV 3/5] END colsample_bytree=0.95, subsample=0.95;, score=0.876 total time= 8.3s [CV 4/5] END colsample_bytree=0.95, subsample=0.95;, score=0.873 total time= 8.2s [CV 5/5] END colsample_bytree=0.95, subsample=0.95;, score=0.868 total time= 8.3s
{'colsample_bytree': 0.85, 'subsample': 0.95}
xgb_best_params_2 = {'colsample_bytree': 0.85, 'subsample': 0.95}
xgb_class_optimized_2 = XGBClassifier(max_depth=13, min_child_weight=1, gamma=0.2)
xgb_class_optimized_2.set_params(**xgb_best_params_2)
xgb_class_optimized_2.fit(X_train, y_train)
xgb_class_optimized_2_model_scores = compute_performance([xgb_class_optimized_2], X_train, y_train, scoring_funs)
xgb_class_optimized_2_model_scores
std of 0.871543552201901 of type accuracy is 0.0014304526253521243 std of 0.8773322970284785 of type precision is 0.0029382446098105528 std of 0.869680601833403 of type recall is 0.0030863877294322814
{'XGBClassifier': {'accuracy': 0.871543552201901,
'precision': 0.8773322970284785,
'recall': 0.869680601833403}}
Sfortunatamente le performance sono rimaste inalterate. I valori degli ultimi due iperparaemtri verranno comunque utilizzati perche' rendono, seppur marginalmente, meno prono ad overfitting il modello rispetto ai valori di default (entrambi pari ad 1).
Effettuato il training e la validazione, è arrivato il momento di testare il comportamento dei modelli di fronte a scenari concreti tramite il test set.
names_of_classifiers = ['Random forest', 'K-neighbors', 'SGD', 'Decision tree', 'Naive bayes', 'XGBoost']
classifiers = [rnd_forest_class_optimized, knn_class_optimized, sgd_clf, decision_tree_class, mnb_class, xgb_class_optimized_2]
colors = ['darkorange', 'darkgreen', 'cornflowerblue', 'firebrick', 'darkgoldenrod', 'lightseagreen']
def get_test_set(classifier_name):
return {
'Naive bayes': discretized_X_test,
'SGD': scaled_X_test,
'K-neighbors': normalized_X_test,
}.get(classifier_name, X_test)
def get_score(y_class_test, y_class_pred, score_type):
if score_type == 'accuracy':
return accuracy_score(y_class_test, y_class_pred)
elif score_type == 'precision':
return precision_score(y_class_test, y_class_pred)
else:
return recall_score(y_class_test, y_class_pred)
def plot_bar(names_of_classifiers, test_scores, title, score_type, colors):
plt.figure(figsize=(14, 7))
plt.ylabel(score_type)
plt.title(title)
plt.bar(names_of_classifiers, test_scores, color=colors, width = 0.4, alpha=0.7)
plt.show()
test_scores = np.empty(len(classifiers), dtype=float)
for score_type in scoring_funs:
for j in range(len(classifiers)):
X_test_1 = get_test_set(names_of_classifiers[j])
y_class_pred = classifiers[j].predict(X_test_1)
test_scores[j] = get_score(y_test, y_class_pred, score_type)
plot_bar(names_of_classifiers, test_scores, str(score_type) + ' of classifiers', score_type, colors)
Soprendentemente, k-nearest neighbors compete a pari passo dei i modelli ensemble, rispettivamente random forest e xgboost; considerando che è un singolo classificatore, i risultati sono più ottenuti sono tali da cementarlo come un ottimo modello. L'eccellenza di random forest e xgboost e' ulteriormente confermata, come era stato suggerito dagli score nella validazione. Decision tree si posiziona in $4^o$ posto, non avendo le armi per competere con i gli ensemble e k-nearest neighbors. Sgd e naive bayes si collocano negli ultimi posti della classifica con degli score piuttosto discreti, nonostante operino in condizioni "approssimate".
Guardando solo ai grafici, risulta difficile capire quali siano i valori precisi delle 3 misure. Al fine di comparare i risultati visti nella fase precedente con quelli appena calcolati, rappresentiamo mediante una heatmap i classification report dei singoli classificatori
def plot_heatmap(classification_report, title):
plt.figure(figsize=(8,5))
temp = ps.DataFrame(classification_report).iloc[:-1, :].T
temp.drop(['micro avg', 'macro avg', 'weighted avg'], axis=0, inplace=True)
sns.heatmap(temp, annot=True)
plt.title(title, fontsize = 16, fontname="Times New Roman Bold")
labels = np.arange(10)
names = ['CT', 'T']
axes = ['precision', 'recall', 'f1-score']
y_pred_random_forest_optimized = rnd_forest_class_optimized.predict(X_test)
random_forest_optimized_report = classification_report(y_test, y_pred_random_forest_optimized, labels=labels, target_names=names, output_dict=True)
plot_heatmap(random_forest_optimized_report, 'Classification report of tuned random forest')
Come asserito precedentemente, i risultati ottenuti ottenuti con il classificatore sono decisamente ottimali; sono addiritura marginalmente piu' alti di quanto osservato nella validazione. Confrontiamoli con il random forest aventi gli iperparametri di default.
y_pred_random_forest = rnd_class.predict(X_test)
random_forest_report = classification_report(y_test, y_pred_random_forest, labels=labels, target_names=names, output_dict=True)
plot_heatmap(random_forest_report, 'Classification report of random forest')
Le differenze sono piuttosto marginali; d'altronde gli score di partenza erano gia' collocati nella fascia di ottimalita'.
y_pred_naive_bayes = mnb_class.predict(discretized_X_test)
naive_bayes_report = classification_report(y_test, y_pred_naive_bayes, labels=labels, target_names=names, output_dict=True)
plot_heatmap(naive_bayes_report, 'Classification report of naive bayes')
Vengono confermati i risultati discreti della validazione.
y_pred_sgd = sgd_clf.predict(scaled_X_test)
sgd_report = classification_report(y_test, y_pred_sgd, labels=labels, target_names=names, output_dict=True)
plot_heatmap(sgd_report, 'Classification report of SGD')
Rispetto al Naive bayes gli scomparano sono pressoche' comparabili. Prendendo anche in considerazione il tempo impiegato da ambedue i modelli nella fase di learning, si potrebbe tendere a preferire per il naive bayes per i tempi di esecuzione sensibilmente piu' veloci dell'SGD, tuttavia bisogna tenere a mente che e performance di quest'ultimo potrebbero,probabilmente, migliorare con un tuning degli iperparametri piu' approfondito, cosa non possibile con naive bayes non avendo iperparametri da ottimizzare.
y_pred_knn_optimized = knn_class_optimized.predict(normalized_X_test)
knn_optimized_report = classification_report(y_test, y_pred_knn_optimized, labels=labels, target_names=names, output_dict=True)
plot_heatmap(knn_optimized_report, 'Classification report of tuned KNN')
Gli score sono pressoche' equivalenti a quelli di random forest. Anche in tal caso possiamo vedere come si discostino dal classificatore di base
y_pred_knn = knn_class.predict(normalized_X_test)
knn_report = classification_report(y_test, y_pred_knn, labels=labels, target_names=names, output_dict=True)
plot_heatmap(knn_report, 'Classification report of KNN')
La differenza e' abbastanza pronunciata se comparato a quanto visto tra il random forest ottimizzato e di base.
y_pred_decision_tree = decision_tree_class.predict(X_test)
decision_tree_report = classification_report(y_test, y_pred_decision_tree, labels=labels, target_names=names, output_dict=True)
plot_heatmap(decision_tree_report, 'Classification report of Decision tree')
Gli score sono tali da garantire al modello un posto nella fascia di ottimalita'.
y_pred_xgb_optimized_2 = xgb_class_optimized_2.predict(X_test)
xgb_class_optimized_2_report = classification_report(y_test, y_pred_xgb_optimized_2, labels=labels, target_names=names, output_dict=True)
plot_heatmap(xgb_class_optimized_2_report, 'Classification report of tuned XGBoost')
Gli score sembrano leggermente piu' alti rispetto a quelli visti nella validazione (circa 0.01 in piu').
y_pred_xgb = xgb_class.predict(X_test)
xgb_class_report = classification_report(y_test, y_pred_xgb, labels=labels, target_names=names, output_dict=True)
plot_heatmap(xgb_class_report, 'Classification report of XGBoost')
Il salto di prestazioni con la versione ottimizzata risulta abbastanza accentuato, sopratutto per la recall, che ha avuto un incremento del 10%.
def compute_fpr_tpr(classifiers, names_of_classifiers):
classifiers_fpr = dict()
classifiers_tpr = dict()
for i in range(len(classifiers)):
classifier_predict_probabilities = classifiers[i].predict_proba(get_test_set(names_of_classifiers[i]))[:,1]
classifiers_fpr[i], classifiers_tpr[i], _ = roc_curve(y_test, classifier_predict_probabilities)
return classifiers_fpr, classifiers_tpr
classifiers_fpr, classifiers_tpr = compute_fpr_tpr(classifiers, names_of_classifiers)
def set_plot_labels():
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
def plot_roc_auc(classifiers_fpr, classifiers_tpr, names_of_classifiers, colors):
plt.figure(figsize=(16, 10))
for i in range(len(names_of_classifiers)):
plt.plot(classifiers_fpr[i], classifiers_tpr[i], color=colors[i],lw=2,
label= names_of_classifiers[i] + ' (area = %0.2f)' % auc(classifiers_fpr[i], classifiers_tpr[i]))
set_plot_labels()
from sklearn.metrics import roc_curve, auc
plot_roc_auc(classifiers_fpr, classifiers_tpr, names_of_classifiers, colors)
Le curve roc non fanno che confermare quanto appena discusso con i grafici a barre.
Se dovessimo prendere una decisione in merito a quale modello scegliere la risposta sarebbe tendente verso random forest dato che le performance sono le migliori registrate tra i modelli comparati, inoltre il processo di training risulta piu' rapido degli altri due modelli occupanti il podio (rispettivamente xgboost e kneighbors).
Le reti neurali artificiali sono dei modelli computazionali composti da neuroni il cui comportamento è ispirato alle reti neurali biologiche. In pratica, una ANN è costituita da un grafo ove i nodi rappresentano i neuroni. Tale grafo è pesato sugli archi, l'obiettivo dell'algoritmo di apprendimento è quello di scegliere pesi e bias in modo tale da far sì che l'errore commesso dalla rete sia il più piccolo possibile.
Le reti neurali trovano applicazioni in moltissimi ambiti come nell'automatica, nel machine e deep learning, ma anche nell'ambito dei task di classificazione.
Image(url='https://www.researchgate.net/profile/Sandip-Lahiri/publication/26614896/figure/fig1/AS:310007494135809@1450922954279/A-schematic-diagram-of-artificial-neural-network-and-architecture-of-the-feed-forward.png',
width=700,height=500)
Il funzionamento di un singolo nodo è abbastanza semplice, una volta che il neurone raccoglie gli input ($\vec{w} \cdot \vec{x}$) utilizza una funzione di attivazione; tale valore sarà l'output del neurone. In una rete composta da più neuroni, questi sono organizzati in layers, in modo tale che l'output di un livello diviene l'input del successivo.
Tra le forme più semplici di reti neurali vi sono le reti feedforward, in cui i segnali viaggiano in un'unica direzione. In queste tipologie di reti il training si compone di due fasi:
Per poter aggiornare i pesi è necessario conoscere il learming rate $\lambda$ (scelto dall'utente) e la "dipendenza" dell'errore da ogni peso
$$ w_{ij}^{l}=w_{ij}^{l}-\lambda\frac{\partial E}{\partial w_{ij}^{l}} $$Come si ottiene la derivata parziale? Utilizzando la chain rule si ricava.
$$ \frac{\partial E}{\partial w_{ij}^l}=\frac{\partial E}{\partial a_{i}^{l}}\frac{\partial a_{i}^{l}}{\partial z_{i}^{l}}\frac{\partial z_{i}^{l}}{\partial w_{ij}^{l}} $$Dove $a_{i}^{l}$ è l'output del neurone i-esimo al livello $l$, $z_{i}^{l}$ è l'output del neurone i-esimo al livello $l$ a cui non è stata applicata la funzione di attivazione.
Il calcolo di questi valori dipende poi dalle caratteristiche della rete, in particolar modo, da quale funzione di attivazione viene utilizzata. Solitamente si preferiscono funzioni continue (come Sigmoid e tangente iperbolica) per rendere più semplice il calcolo della derivata.
Si è quindi deciso di utilizzare una rete neurale per il nostro task di classificazione. A tal fine si devono importare le librerie di tensorflow e di keras che ci permetteranno di progettare, costruire, allenare e provare la rete.
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense
Come prima cosa creiamo i set di nostro interesse, andando a effettuare lo scaling per aumentare le prestazioni.
X_TMP, X_TEST, Y_TMP, Y_TEST= train_test_split(ds_x, ds_y, test_size=0.2, random_state=42,stratify=ds_y)
X_TRAIN, X_VAL, Y_TRAIN,Y_VAL = train_test_split(X_TMP, Y_TMP, test_size=0.25, random_state=42,stratify=Y_TMP)
standard_scaler = StandardScaler()
scaled_X_TRAIN = DataFrame(standard_scaler.fit_transform(X_TRAIN))
scaled_X_TEST = DataFrame(standard_scaler.fit_transform(X_TEST))
scaled_X_VAL=DataFrame(standard_scaler.fit_transform(X_VAL))
La prima versione della rete che andremo a creare è abbastanza semplice, e si basa su un approccio naive, ove andiamo a scegliere a "occhio" quali potranno essere i migliori parametri da utilizzare.
In particolar modo, keras ci permette di specificare quanti layer inserire, e, per ognuno di essi il numero di neuroni e la funzione di attivazione da utilizzare. La descrizione della rete è quindi riportata di sotto.
Per il traing si è deciso di considerare 200 epoche e un batch size di 1024, dove:
Questi valori sono stati scelti per evitare un eccessivo tempo di esecuzione, soprattutto per ciò che concerne il batch size, una grana troppo fine avrebbe richiesto troppo tempo per effettuare il training (in generale, le ANN presentano dei tempi per il training molto elevati. Inoltre, il dataset in questione, presentava un'elevata cardinalità). Per permettere di valutare la bontà del risultato si è deciso di usare il validation set.
Per quanto invece concerne la funzione di attivazione, si è scelta la relu per i primi due livelli.
E la sigmoid per l'ultimo
$$ sigmoid(x)=\frac{1}{1+e^{-x}} $$Tale scelta deriva dal fatto che l'uso della relu permette di risolvere il problema della scomparsa del gradiente (ciò porta le reti ad apprendere troppo lentamente o a non apprendere del tutto. Per i nodi che presentano la sigmoide il massimo valore della derivata è 0.25; $$ \sigma^{'}(x)=\sigma(x)(1-\sigma(x))$$
Image(url='https://www.kdnuggets.com/wp-content/uploads/jacob-vanishing-gradient-2.jpg')
quando vi sono più livelli nella rete, il valore del prodotto delle derivate diminuisce fino al punto che le la derivata parziale della loss functuion si avvicina a 0, la derivata parziale svanisce),
evitando i neuroni inattivi. La sigmoide viene solitamente usata nell'ultimo livello per i problemi di classificazione binaria (mentra la softmax è solitamente indicata per task di multi-classification).
Per la valutazione delle performance, oltre a usare il validation set, si è considerata l'accuracy come metrica di riferimento.
model = Sequential()
model.add(Dense(42, activation='relu',input_dim=42))
model.add(Dense(84, activation='relu'))
model.add(Dense(1, activation='sigmoid',kernel_initializer="normal"))
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
history=model.fit(scaled_X_TRAIN, Y_TRAIN,validation_data=(scaled_X_VAL,Y_VAL) ,epochs=200, batch_size=1024, verbose=1)
Epoch 1/200 72/72 [==============================] - 6s 70ms/step - loss: 0.6910 - accuracy: 0.5335 - val_loss: 0.6864 - val_accuracy: 0.5728 Epoch 2/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6818 - accuracy: 0.6096 - val_loss: 0.6781 - val_accuracy: 0.6255 Epoch 3/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6733 - accuracy: 0.6483 - val_loss: 0.6695 - val_accuracy: 0.6523 Epoch 4/200 72/72 [==============================] - 0s 5ms/step - loss: 0.6643 - accuracy: 0.6687 - val_loss: 0.6601 - val_accuracy: 0.6696 Epoch 5/200 72/72 [==============================] - 0s 5ms/step - loss: 0.6543 - accuracy: 0.6817 - val_loss: 0.6496 - val_accuracy: 0.6827 Epoch 6/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6432 - accuracy: 0.6898 - val_loss: 0.6378 - val_accuracy: 0.6900 Epoch 7/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6307 - accuracy: 0.6969 - val_loss: 0.6249 - val_accuracy: 0.6967 Epoch 8/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6173 - accuracy: 0.7019 - val_loss: 0.6111 - val_accuracy: 0.7033 Epoch 9/200 72/72 [==============================] - 0s 4ms/step - loss: 0.6034 - accuracy: 0.7070 - val_loss: 0.5971 - val_accuracy: 0.7061 Epoch 10/200 72/72 [==============================] - 0s 6ms/step - loss: 0.5897 - accuracy: 0.7108 - val_loss: 0.5836 - val_accuracy: 0.7094 Epoch 11/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5768 - accuracy: 0.7130 - val_loss: 0.5712 - val_accuracy: 0.7131 Epoch 12/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5652 - accuracy: 0.7150 - val_loss: 0.5603 - val_accuracy: 0.7147 Epoch 13/200 72/72 [==============================] - 0s 5ms/step - loss: 0.5552 - accuracy: 0.7171 - val_loss: 0.5510 - val_accuracy: 0.7176 Epoch 14/200 72/72 [==============================] - 0s 6ms/step - loss: 0.5468 - accuracy: 0.7190 - val_loss: 0.5433 - val_accuracy: 0.7202 Epoch 15/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5398 - accuracy: 0.7212 - val_loss: 0.5369 - val_accuracy: 0.7213 Epoch 16/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5341 - accuracy: 0.7232 - val_loss: 0.5316 - val_accuracy: 0.7222 Epoch 17/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5293 - accuracy: 0.7249 - val_loss: 0.5273 - val_accuracy: 0.7228 Epoch 18/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5254 - accuracy: 0.7261 - val_loss: 0.5236 - val_accuracy: 0.7239 Epoch 19/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5221 - accuracy: 0.7271 - val_loss: 0.5205 - val_accuracy: 0.7260 Epoch 20/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5192 - accuracy: 0.7284 - val_loss: 0.5179 - val_accuracy: 0.7268 Epoch 21/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5168 - accuracy: 0.7292 - val_loss: 0.5155 - val_accuracy: 0.7278 Epoch 22/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5147 - accuracy: 0.7294 - val_loss: 0.5135 - val_accuracy: 0.7283 Epoch 23/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5128 - accuracy: 0.7299 - val_loss: 0.5118 - val_accuracy: 0.7286 Epoch 24/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5111 - accuracy: 0.7306 - val_loss: 0.5102 - val_accuracy: 0.7292 Epoch 25/200 72/72 [==============================] - 0s 5ms/step - loss: 0.5096 - accuracy: 0.7308 - val_loss: 0.5087 - val_accuracy: 0.7292 Epoch 26/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5082 - accuracy: 0.7317 - val_loss: 0.5074 - val_accuracy: 0.7289 Epoch 27/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5069 - accuracy: 0.7322 - val_loss: 0.5062 - val_accuracy: 0.7289 Epoch 28/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5058 - accuracy: 0.7324 - val_loss: 0.5051 - val_accuracy: 0.7292 Epoch 29/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5047 - accuracy: 0.7328 - val_loss: 0.5041 - val_accuracy: 0.7292 Epoch 30/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5037 - accuracy: 0.7333 - val_loss: 0.5031 - val_accuracy: 0.7296 Epoch 31/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5028 - accuracy: 0.7336 - val_loss: 0.5022 - val_accuracy: 0.7296 Epoch 32/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5019 - accuracy: 0.7342 - val_loss: 0.5013 - val_accuracy: 0.7295 Epoch 33/200 72/72 [==============================] - 0s 5ms/step - loss: 0.5011 - accuracy: 0.7346 - val_loss: 0.5005 - val_accuracy: 0.7302 Epoch 34/200 72/72 [==============================] - 0s 4ms/step - loss: 0.5003 - accuracy: 0.7351 - val_loss: 0.4998 - val_accuracy: 0.7301 Epoch 35/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4995 - accuracy: 0.7351 - val_loss: 0.4990 - val_accuracy: 0.7298 Epoch 36/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4988 - accuracy: 0.7357 - val_loss: 0.4983 - val_accuracy: 0.7299 Epoch 37/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4981 - accuracy: 0.7360 - val_loss: 0.4977 - val_accuracy: 0.7301 Epoch 38/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4975 - accuracy: 0.7361 - val_loss: 0.4970 - val_accuracy: 0.7302 Epoch 39/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4968 - accuracy: 0.7362 - val_loss: 0.4964 - val_accuracy: 0.7307 Epoch 40/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4963 - accuracy: 0.7362 - val_loss: 0.4958 - val_accuracy: 0.7305 Epoch 41/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4957 - accuracy: 0.7365 - val_loss: 0.4952 - val_accuracy: 0.7308 Epoch 42/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4951 - accuracy: 0.7365 - val_loss: 0.4947 - val_accuracy: 0.7319 Epoch 43/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4946 - accuracy: 0.7366 - val_loss: 0.4942 - val_accuracy: 0.7321 Epoch 44/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4941 - accuracy: 0.7372 - val_loss: 0.4937 - val_accuracy: 0.7321 Epoch 45/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4936 - accuracy: 0.7370 - val_loss: 0.4932 - val_accuracy: 0.7335 Epoch 46/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4931 - accuracy: 0.7373 - val_loss: 0.4927 - val_accuracy: 0.7338 Epoch 47/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4926 - accuracy: 0.7374 - val_loss: 0.4922 - val_accuracy: 0.7336 Epoch 48/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4922 - accuracy: 0.7376 - val_loss: 0.4918 - val_accuracy: 0.7339 Epoch 49/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4917 - accuracy: 0.7380 - val_loss: 0.4913 - val_accuracy: 0.7339 Epoch 50/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4913 - accuracy: 0.7381 - val_loss: 0.4909 - val_accuracy: 0.7341 Epoch 51/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4909 - accuracy: 0.7384 - val_loss: 0.4905 - val_accuracy: 0.7346 Epoch 52/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4905 - accuracy: 0.7389 - val_loss: 0.4901 - val_accuracy: 0.7352 Epoch 53/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4901 - accuracy: 0.7390 - val_loss: 0.4897 - val_accuracy: 0.7352 Epoch 54/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4897 - accuracy: 0.7393 - val_loss: 0.4893 - val_accuracy: 0.7349 Epoch 55/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4893 - accuracy: 0.7395 - val_loss: 0.4890 - val_accuracy: 0.7350 Epoch 56/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4889 - accuracy: 0.7398 - val_loss: 0.4886 - val_accuracy: 0.7353 Epoch 57/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4886 - accuracy: 0.7397 - val_loss: 0.4883 - val_accuracy: 0.7350 Epoch 58/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4882 - accuracy: 0.7400 - val_loss: 0.4879 - val_accuracy: 0.7352 Epoch 59/200 72/72 [==============================] - 0s 7ms/step - loss: 0.4878 - accuracy: 0.7401 - val_loss: 0.4876 - val_accuracy: 0.7354 Epoch 60/200 72/72 [==============================] - 0s 7ms/step - loss: 0.4875 - accuracy: 0.7402 - val_loss: 0.4873 - val_accuracy: 0.7356 Epoch 61/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4872 - accuracy: 0.7402 - val_loss: 0.4869 - val_accuracy: 0.7359 Epoch 62/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4868 - accuracy: 0.7401 - val_loss: 0.4866 - val_accuracy: 0.7357 Epoch 63/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4865 - accuracy: 0.7404 - val_loss: 0.4863 - val_accuracy: 0.7361 Epoch 64/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4862 - accuracy: 0.7403 - val_loss: 0.4860 - val_accuracy: 0.7363 Epoch 65/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4859 - accuracy: 0.7405 - val_loss: 0.4857 - val_accuracy: 0.7364 Epoch 66/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4856 - accuracy: 0.7407 - val_loss: 0.4854 - val_accuracy: 0.7365 Epoch 67/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4853 - accuracy: 0.7404 - val_loss: 0.4852 - val_accuracy: 0.7366 Epoch 68/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4850 - accuracy: 0.7408 - val_loss: 0.4849 - val_accuracy: 0.7366 Epoch 69/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4847 - accuracy: 0.7408 - val_loss: 0.4846 - val_accuracy: 0.7366 Epoch 70/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4844 - accuracy: 0.7409 - val_loss: 0.4843 - val_accuracy: 0.7368 Epoch 71/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4841 - accuracy: 0.7411 - val_loss: 0.4841 - val_accuracy: 0.7363 Epoch 72/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4838 - accuracy: 0.7411 - val_loss: 0.4838 - val_accuracy: 0.7365 Epoch 73/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4836 - accuracy: 0.7411 - val_loss: 0.4836 - val_accuracy: 0.7367 Epoch 74/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4833 - accuracy: 0.7414 - val_loss: 0.4833 - val_accuracy: 0.7364 Epoch 75/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4830 - accuracy: 0.7416 - val_loss: 0.4831 - val_accuracy: 0.7368 Epoch 76/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4828 - accuracy: 0.7418 - val_loss: 0.4828 - val_accuracy: 0.7365 Epoch 77/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4825 - accuracy: 0.7418 - val_loss: 0.4826 - val_accuracy: 0.7366 Epoch 78/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4822 - accuracy: 0.7421 - val_loss: 0.4824 - val_accuracy: 0.7370 Epoch 79/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4820 - accuracy: 0.7422 - val_loss: 0.4821 - val_accuracy: 0.7372 Epoch 80/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4817 - accuracy: 0.7421 - val_loss: 0.4819 - val_accuracy: 0.7367 Epoch 81/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4815 - accuracy: 0.7423 - val_loss: 0.4817 - val_accuracy: 0.7370 Epoch 82/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4813 - accuracy: 0.7424 - val_loss: 0.4815 - val_accuracy: 0.7375 Epoch 83/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4810 - accuracy: 0.7425 - val_loss: 0.4812 - val_accuracy: 0.7378 Epoch 84/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4808 - accuracy: 0.7423 - val_loss: 0.4810 - val_accuracy: 0.7377 Epoch 85/200 72/72 [==============================] - 0s 7ms/step - loss: 0.4805 - accuracy: 0.7424 - val_loss: 0.4808 - val_accuracy: 0.7379 Epoch 86/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4803 - accuracy: 0.7425 - val_loss: 0.4806 - val_accuracy: 0.7374 Epoch 87/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4801 - accuracy: 0.7424 - val_loss: 0.4804 - val_accuracy: 0.7377 Epoch 88/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4799 - accuracy: 0.7423 - val_loss: 0.4802 - val_accuracy: 0.7374 Epoch 89/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4796 - accuracy: 0.7424 - val_loss: 0.4800 - val_accuracy: 0.7377 Epoch 90/200 72/72 [==============================] - 0s 7ms/step - loss: 0.4794 - accuracy: 0.7425 - val_loss: 0.4798 - val_accuracy: 0.7374 Epoch 91/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4792 - accuracy: 0.7427 - val_loss: 0.4796 - val_accuracy: 0.7378 Epoch 92/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4790 - accuracy: 0.7427 - val_loss: 0.4794 - val_accuracy: 0.7375 Epoch 93/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4788 - accuracy: 0.7428 - val_loss: 0.4792 - val_accuracy: 0.7377 Epoch 94/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4786 - accuracy: 0.7430 - val_loss: 0.4790 - val_accuracy: 0.7379 Epoch 95/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4783 - accuracy: 0.7429 - val_loss: 0.4789 - val_accuracy: 0.7378 Epoch 96/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4781 - accuracy: 0.7431 - val_loss: 0.4787 - val_accuracy: 0.7378 Epoch 97/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4779 - accuracy: 0.7433 - val_loss: 0.4785 - val_accuracy: 0.7384 Epoch 98/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4777 - accuracy: 0.7435 - val_loss: 0.4783 - val_accuracy: 0.7386 Epoch 99/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4775 - accuracy: 0.7435 - val_loss: 0.4781 - val_accuracy: 0.7383 Epoch 100/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4773 - accuracy: 0.7437 - val_loss: 0.4780 - val_accuracy: 0.7388 Epoch 101/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4772 - accuracy: 0.7438 - val_loss: 0.4778 - val_accuracy: 0.7389 Epoch 102/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4770 - accuracy: 0.7438 - val_loss: 0.4776 - val_accuracy: 0.7396 Epoch 103/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4768 - accuracy: 0.7442 - val_loss: 0.4774 - val_accuracy: 0.7392 Epoch 104/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4766 - accuracy: 0.7441 - val_loss: 0.4773 - val_accuracy: 0.7396 Epoch 105/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4764 - accuracy: 0.7442 - val_loss: 0.4771 - val_accuracy: 0.7396 Epoch 106/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4762 - accuracy: 0.7444 - val_loss: 0.4769 - val_accuracy: 0.7399 Epoch 107/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4761 - accuracy: 0.7446 - val_loss: 0.4768 - val_accuracy: 0.7404 Epoch 108/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4759 - accuracy: 0.7446 - val_loss: 0.4766 - val_accuracy: 0.7400 Epoch 109/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4757 - accuracy: 0.7447 - val_loss: 0.4765 - val_accuracy: 0.7400 Epoch 110/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4755 - accuracy: 0.7447 - val_loss: 0.4763 - val_accuracy: 0.7402 Epoch 111/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4753 - accuracy: 0.7449 - val_loss: 0.4762 - val_accuracy: 0.7404 Epoch 112/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4752 - accuracy: 0.7449 - val_loss: 0.4760 - val_accuracy: 0.7404 Epoch 113/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4750 - accuracy: 0.7448 - val_loss: 0.4759 - val_accuracy: 0.7406 Epoch 114/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4748 - accuracy: 0.7457 - val_loss: 0.4757 - val_accuracy: 0.7408 Epoch 115/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4747 - accuracy: 0.7456 - val_loss: 0.4756 - val_accuracy: 0.7406 Epoch 116/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4745 - accuracy: 0.7454 - val_loss: 0.4754 - val_accuracy: 0.7411 Epoch 117/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4743 - accuracy: 0.7462 - val_loss: 0.4753 - val_accuracy: 0.7409 Epoch 118/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4742 - accuracy: 0.7461 - val_loss: 0.4751 - val_accuracy: 0.7411 Epoch 119/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4740 - accuracy: 0.7459 - val_loss: 0.4750 - val_accuracy: 0.7417 Epoch 120/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4738 - accuracy: 0.7463 - val_loss: 0.4749 - val_accuracy: 0.7418 Epoch 121/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4737 - accuracy: 0.7464 - val_loss: 0.4747 - val_accuracy: 0.7420 Epoch 122/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4735 - accuracy: 0.7467 - val_loss: 0.4746 - val_accuracy: 0.7423 Epoch 123/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4734 - accuracy: 0.7466 - val_loss: 0.4745 - val_accuracy: 0.7422 Epoch 124/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4732 - accuracy: 0.7466 - val_loss: 0.4743 - val_accuracy: 0.7426 Epoch 125/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4730 - accuracy: 0.7472 - val_loss: 0.4742 - val_accuracy: 0.7424 Epoch 126/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4729 - accuracy: 0.7471 - val_loss: 0.4741 - val_accuracy: 0.7424 Epoch 127/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4728 - accuracy: 0.7470 - val_loss: 0.4740 - val_accuracy: 0.7427 Epoch 128/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4726 - accuracy: 0.7470 - val_loss: 0.4738 - val_accuracy: 0.7433 Epoch 129/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4725 - accuracy: 0.7472 - val_loss: 0.4737 - val_accuracy: 0.7432 Epoch 130/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4723 - accuracy: 0.7474 - val_loss: 0.4736 - val_accuracy: 0.7433 Epoch 131/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4722 - accuracy: 0.7474 - val_loss: 0.4735 - val_accuracy: 0.7433 Epoch 132/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4720 - accuracy: 0.7476 - val_loss: 0.4734 - val_accuracy: 0.7432 Epoch 133/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4719 - accuracy: 0.7477 - val_loss: 0.4733 - val_accuracy: 0.7431 Epoch 134/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4717 - accuracy: 0.7479 - val_loss: 0.4731 - val_accuracy: 0.7432 Epoch 135/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4716 - accuracy: 0.7478 - val_loss: 0.4730 - val_accuracy: 0.7433 Epoch 136/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4714 - accuracy: 0.7480 - val_loss: 0.4729 - val_accuracy: 0.7433 Epoch 137/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4713 - accuracy: 0.7480 - val_loss: 0.4728 - val_accuracy: 0.7433 Epoch 138/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4712 - accuracy: 0.7480 - val_loss: 0.4727 - val_accuracy: 0.7434 Epoch 139/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4710 - accuracy: 0.7481 - val_loss: 0.4726 - val_accuracy: 0.7433 Epoch 140/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4709 - accuracy: 0.7483 - val_loss: 0.4725 - val_accuracy: 0.7435 Epoch 141/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4708 - accuracy: 0.7483 - val_loss: 0.4724 - val_accuracy: 0.7433 Epoch 142/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4706 - accuracy: 0.7481 - val_loss: 0.4723 - val_accuracy: 0.7431 Epoch 143/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4705 - accuracy: 0.7487 - val_loss: 0.4722 - val_accuracy: 0.7438 Epoch 144/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4704 - accuracy: 0.7484 - val_loss: 0.4721 - val_accuracy: 0.7438 Epoch 145/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4703 - accuracy: 0.7485 - val_loss: 0.4720 - val_accuracy: 0.7435 Epoch 146/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4701 - accuracy: 0.7487 - val_loss: 0.4719 - val_accuracy: 0.7438 Epoch 147/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4700 - accuracy: 0.7485 - val_loss: 0.4718 - val_accuracy: 0.7440 Epoch 148/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4699 - accuracy: 0.7487 - val_loss: 0.4717 - val_accuracy: 0.7443 Epoch 149/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4697 - accuracy: 0.7491 - val_loss: 0.4716 - val_accuracy: 0.7444 Epoch 150/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4696 - accuracy: 0.7493 - val_loss: 0.4715 - val_accuracy: 0.7435 Epoch 151/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4695 - accuracy: 0.7493 - val_loss: 0.4714 - val_accuracy: 0.7440 Epoch 152/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4694 - accuracy: 0.7493 - val_loss: 0.4713 - val_accuracy: 0.7444 Epoch 153/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4692 - accuracy: 0.7497 - val_loss: 0.4712 - val_accuracy: 0.7444 Epoch 154/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4691 - accuracy: 0.7493 - val_loss: 0.4711 - val_accuracy: 0.7446 Epoch 155/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4690 - accuracy: 0.7495 - val_loss: 0.4710 - val_accuracy: 0.7451 Epoch 156/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4689 - accuracy: 0.7496 - val_loss: 0.4709 - val_accuracy: 0.7448 Epoch 157/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4688 - accuracy: 0.7496 - val_loss: 0.4708 - val_accuracy: 0.7449 Epoch 158/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4686 - accuracy: 0.7498 - val_loss: 0.4708 - val_accuracy: 0.7451 Epoch 159/200 72/72 [==============================] - 0s 7ms/step - loss: 0.4685 - accuracy: 0.7499 - val_loss: 0.4707 - val_accuracy: 0.7455 Epoch 160/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4684 - accuracy: 0.7499 - val_loss: 0.4706 - val_accuracy: 0.7454 Epoch 161/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4683 - accuracy: 0.7499 - val_loss: 0.4705 - val_accuracy: 0.7456 Epoch 162/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4682 - accuracy: 0.7502 - val_loss: 0.4704 - val_accuracy: 0.7458 Epoch 163/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4680 - accuracy: 0.7501 - val_loss: 0.4703 - val_accuracy: 0.7456 Epoch 164/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4679 - accuracy: 0.7499 - val_loss: 0.4702 - val_accuracy: 0.7456 Epoch 165/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4678 - accuracy: 0.7505 - val_loss: 0.4702 - val_accuracy: 0.7459 Epoch 166/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4677 - accuracy: 0.7505 - val_loss: 0.4701 - val_accuracy: 0.7462 Epoch 167/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4676 - accuracy: 0.7507 - val_loss: 0.4700 - val_accuracy: 0.7461 Epoch 168/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4675 - accuracy: 0.7504 - val_loss: 0.4699 - val_accuracy: 0.7462 Epoch 169/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4674 - accuracy: 0.7510 - val_loss: 0.4699 - val_accuracy: 0.7463 Epoch 170/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4673 - accuracy: 0.7508 - val_loss: 0.4698 - val_accuracy: 0.7463 Epoch 171/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4672 - accuracy: 0.7506 - val_loss: 0.4697 - val_accuracy: 0.7465 Epoch 172/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4671 - accuracy: 0.7509 - val_loss: 0.4696 - val_accuracy: 0.7464 Epoch 173/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4670 - accuracy: 0.7510 - val_loss: 0.4696 - val_accuracy: 0.7462 Epoch 174/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4669 - accuracy: 0.7513 - val_loss: 0.4695 - val_accuracy: 0.7463 Epoch 175/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4668 - accuracy: 0.7511 - val_loss: 0.4694 - val_accuracy: 0.7464 Epoch 176/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4667 - accuracy: 0.7512 - val_loss: 0.4693 - val_accuracy: 0.7467 Epoch 177/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4665 - accuracy: 0.7515 - val_loss: 0.4693 - val_accuracy: 0.7469 Epoch 178/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4664 - accuracy: 0.7516 - val_loss: 0.4692 - val_accuracy: 0.7469 Epoch 179/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4663 - accuracy: 0.7514 - val_loss: 0.4691 - val_accuracy: 0.7470 Epoch 180/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4662 - accuracy: 0.7519 - val_loss: 0.4691 - val_accuracy: 0.7469 Epoch 181/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4661 - accuracy: 0.7518 - val_loss: 0.4690 - val_accuracy: 0.7471 Epoch 182/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4660 - accuracy: 0.7521 - val_loss: 0.4689 - val_accuracy: 0.7476 Epoch 183/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4659 - accuracy: 0.7520 - val_loss: 0.4688 - val_accuracy: 0.7475 Epoch 184/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4658 - accuracy: 0.7521 - val_loss: 0.4688 - val_accuracy: 0.7476 Epoch 185/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4657 - accuracy: 0.7524 - val_loss: 0.4687 - val_accuracy: 0.7477 Epoch 186/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4656 - accuracy: 0.7524 - val_loss: 0.4686 - val_accuracy: 0.7479 Epoch 187/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4656 - accuracy: 0.7523 - val_loss: 0.4685 - val_accuracy: 0.7480 Epoch 188/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4654 - accuracy: 0.7523 - val_loss: 0.4685 - val_accuracy: 0.7481 Epoch 189/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4654 - accuracy: 0.7525 - val_loss: 0.4684 - val_accuracy: 0.7482 Epoch 190/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4653 - accuracy: 0.7527 - val_loss: 0.4684 - val_accuracy: 0.7481 Epoch 191/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4652 - accuracy: 0.7526 - val_loss: 0.4683 - val_accuracy: 0.7482 Epoch 192/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4651 - accuracy: 0.7528 - val_loss: 0.4682 - val_accuracy: 0.7484 Epoch 193/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4650 - accuracy: 0.7529 - val_loss: 0.4682 - val_accuracy: 0.7481 Epoch 194/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4649 - accuracy: 0.7527 - val_loss: 0.4681 - val_accuracy: 0.7483 Epoch 195/200 72/72 [==============================] - 0s 4ms/step - loss: 0.4648 - accuracy: 0.7530 - val_loss: 0.4680 - val_accuracy: 0.7484 Epoch 196/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4647 - accuracy: 0.7531 - val_loss: 0.4680 - val_accuracy: 0.7483 Epoch 197/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4646 - accuracy: 0.7531 - val_loss: 0.4679 - val_accuracy: 0.7482 Epoch 198/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4645 - accuracy: 0.7532 - val_loss: 0.4679 - val_accuracy: 0.7486 Epoch 199/200 72/72 [==============================] - 0s 5ms/step - loss: 0.4644 - accuracy: 0.7535 - val_loss: 0.4678 - val_accuracy: 0.7489 Epoch 200/200 72/72 [==============================] - 0s 6ms/step - loss: 0.4643 - accuracy: 0.7536 - val_loss: 0.4677 - val_accuracy: 0.7489
Una volta che la rete è stata allenata, è possibile valutare l'errore sul test set. Come è possibile osservare, le prestazioni sono soddisfacenti. Non si nota, inoltre overfitting rispetto al training set.
_,accuracy=model.evaluate(scaled_X_TEST,Y_TEST)
765/765 [==============================] - 1s 2ms/step - loss: 0.4695 - accuracy: 0.7441
Possiamo anche dare uno sguardo all'andamento della loss function e dell'accuracy durante l'allenamento della rete.
plt.figure(num=None, figsize=(8, 6), dpi=80)
plt.title('Funzione di Costo')
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')
plt.xlabel('Epoca')
plt.legend()
plt.show()
plt.figure(num=None, figsize=(8, 6), dpi=80)
plt.title('Accuracy')
plt.plot(history.history['accuracy'], label='train')
plt.plot(history.history['val_accuracy'], label='validation')
plt.xlabel('Epoca')
plt.legend()
plt.show()
Anche tramite questi grafici possiamo dedurre che il modello non va in overfitting.
Naturalmente, la scelta degli iperparametri è stata effettuata senza un criterio preciso. Andremo quindi adesso ad effettuare uno studio più approfondito.
from keras_tuner import RandomSearch
Keras mette a disposizione il framework keras-tuner per il tuning degli iperparametri delle ANN. Come prima cosa andremo a creare una funzione che riceve in ingresso gli iperparamtri; nel corpo verrà definita la struttura della rete, in corrispondenza dei punti di scelta verrà usato l'oggetto hp.
I punti critici corrispondono al numero di neuroni per livello (si è considerao un intervallo [6,256]), il numero di layer ([1,4]) e la funzione di attivazione per livello.
def build_model(hp):
model=Sequential()
model.add(Dense(hp.Int("input_units",min_value=6,max_value=256,step=32), activation=hp.Choice('activation_input',values=['relu','sigmoid']),input_dim=42))
for i in range(hp.Int("n_layers",1,5)):
model.add(Dense(hp.Int(f"conv_{i}_units",min_value=6,max_value=256,step=32),activation=hp.Choice(f"activation_{i}",values=['relu','sigmoid'])))
model.add(Dense(1, activation=hp.Choice('activation_output',values=['relu','sigmoid','softmax'])))
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
return model
Per la ricerca degli iperparametri migliori si è fatto uso di RandomSearch (lo spazio di ricerca è estremamente elevato, sarebbe impossibile quindi, provare tutte le possibili combinazioni). La funzione da ottimizzare è l'accuratezza del validation set. Il numero di prove è impostato a 50 e le esecuzioni per prova a 1.
tuner=RandomSearch(build_model,
objective='val_accuracy',
max_trials=50,
executions_per_trial=1,
directory='tuner1')
Il tuning viene ovviamente effettuato sul training set e sul validation, considerando 100 epoche e un batch size pari a 1024.
tuner.search(
x=scaled_X_TRAIN,
y=Y_TRAIN,
epochs=100,
batch_size=1024,
validation_data=(scaled_X_VAL,Y_VAL))
Trial 50 Complete [00h 01m 36s] val_accuracy: 0.7384389042854309 Best val_accuracy So Far: 0.7624402046203613 Total elapsed time: 01h 33m 56s INFO:tensorflow:Oracle triggered exit
Una volta terminato il tuning si procede ad estrarre gli iperparametri della soluzione migliore.
tuner.get_best_hyperparameters()[0].values
{'activation_0': 'relu',
'activation_1': 'relu',
'activation_2': 'relu',
'activation_3': 'relu',
'activation_input': 'relu',
'activation_output': 'relu',
'conv_0_units': 230,
'conv_1_units': 6,
'conv_2_units': 6,
'conv_3_units': 6,
'input_units': 230,
'n_layers': 4}
Tramite results_summary() siamo in grado di visualizzare il resoconto completo del tuner.
tuner.results_summary()
Results summary Results in tuner1/untitled_project Showing 10 best trials <keras_tuner.engine.objective.Objective object at 0x7efed1ad9890> Trial summary Hyperparameters: input_units: 230 activation_input: relu n_layers: 4 conv_0_units: 230 activation_0: relu activation_output: relu conv_1_units: 6 activation_1: relu conv_2_units: 6 activation_2: relu conv_3_units: 6 activation_3: relu Score: 0.7624402046203613 Trial summary Hyperparameters: input_units: 198 activation_input: relu n_layers: 1 conv_0_units: 38 activation_0: relu activation_output: sigmoid conv_1_units: 6 activation_1: relu conv_2_units: 134 activation_2: relu conv_3_units: 70 activation_3: relu conv_4_units: 230 activation_4: sigmoid Score: 0.7533630728721619 Trial summary Hyperparameters: input_units: 166 activation_input: relu n_layers: 2 conv_0_units: 38 activation_0: relu activation_output: sigmoid conv_1_units: 230 activation_1: relu conv_2_units: 38 activation_2: relu conv_3_units: 6 activation_3: relu conv_4_units: 38 activation_4: sigmoid Score: 0.7509506344795227 Trial summary Hyperparameters: input_units: 198 activation_input: relu n_layers: 3 conv_0_units: 166 activation_0: sigmoid activation_output: sigmoid conv_1_units: 6 activation_1: relu conv_2_units: 70 activation_2: relu conv_3_units: 70 activation_3: sigmoid conv_4_units: 166 activation_4: sigmoid Score: 0.7475978136062622 Trial summary Hyperparameters: input_units: 70 activation_input: relu n_layers: 1 conv_0_units: 230 activation_0: relu activation_output: sigmoid conv_1_units: 134 activation_1: relu conv_2_units: 134 activation_2: sigmoid conv_3_units: 70 activation_3: sigmoid Score: 0.7451445460319519 Trial summary Hyperparameters: input_units: 166 activation_input: relu n_layers: 1 conv_0_units: 38 activation_0: sigmoid activation_output: sigmoid conv_1_units: 6 activation_1: sigmoid conv_2_units: 166 activation_2: relu conv_3_units: 102 activation_3: sigmoid conv_4_units: 70 activation_4: relu Score: 0.7421188354492188 Trial summary Hyperparameters: input_units: 102 activation_input: sigmoid n_layers: 2 conv_0_units: 198 activation_0: relu activation_output: sigmoid conv_1_units: 134 activation_1: sigmoid conv_2_units: 166 activation_2: relu conv_3_units: 134 activation_3: sigmoid conv_4_units: 230 activation_4: relu Score: 0.7417508363723755 Trial summary Hyperparameters: input_units: 102 activation_input: sigmoid n_layers: 5 conv_0_units: 134 activation_0: relu activation_output: sigmoid conv_1_units: 134 activation_1: sigmoid conv_2_units: 6 activation_2: relu conv_3_units: 134 activation_3: relu conv_4_units: 134 activation_4: relu Score: 0.7415463924407959 Trial summary Hyperparameters: input_units: 38 activation_input: relu n_layers: 1 conv_0_units: 134 activation_0: sigmoid activation_output: sigmoid conv_1_units: 102 activation_1: sigmoid conv_2_units: 6 activation_2: sigmoid conv_3_units: 38 activation_3: relu conv_4_units: 230 activation_4: relu Score: 0.7400743961334229 Trial summary Hyperparameters: input_units: 166 activation_input: relu n_layers: 2 conv_0_units: 230 activation_0: sigmoid activation_output: sigmoid conv_1_units: 70 activation_1: sigmoid conv_2_units: 102 activation_2: sigmoid conv_3_units: 70 activation_3: sigmoid Score: 0.7397063970565796
E' quindi possibile prendere gli iperparametri scelti e creare una rete con essi.
model = Sequential()
model.add(Dense(230, activation='relu',input_dim=42))
model.add(Dense(230, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(6, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
history=model.fit(scaled_X_TRAIN, Y_TRAIN,validation_data=(scaled_X_VAL,Y_VAL) ,epochs=120, batch_size=1024, verbose=1)
Epoch 1/120 72/72 [==============================] - 6s 78ms/step - loss: 0.6936 - accuracy: 0.5095 - val_loss: 0.6922 - val_accuracy: 0.5144 Epoch 2/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6907 - accuracy: 0.5208 - val_loss: 0.6892 - val_accuracy: 0.5259 Epoch 3/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6873 - accuracy: 0.5364 - val_loss: 0.6853 - val_accuracy: 0.5499 Epoch 4/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6829 - accuracy: 0.5585 - val_loss: 0.6804 - val_accuracy: 0.5688 Epoch 5/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6775 - accuracy: 0.5777 - val_loss: 0.6742 - val_accuracy: 0.5871 Epoch 6/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6706 - accuracy: 0.5984 - val_loss: 0.6665 - val_accuracy: 0.6073 Epoch 7/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6620 - accuracy: 0.6180 - val_loss: 0.6569 - val_accuracy: 0.6278 Epoch 8/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6514 - accuracy: 0.6397 - val_loss: 0.6454 - val_accuracy: 0.6464 Epoch 9/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6391 - accuracy: 0.6579 - val_loss: 0.6325 - val_accuracy: 0.6652 Epoch 10/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6258 - accuracy: 0.6753 - val_loss: 0.6190 - val_accuracy: 0.6785 Epoch 11/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6126 - accuracy: 0.6873 - val_loss: 0.6060 - val_accuracy: 0.6898 Epoch 12/120 72/72 [==============================] - 1s 13ms/step - loss: 0.6002 - accuracy: 0.6980 - val_loss: 0.5943 - val_accuracy: 0.7000 Epoch 13/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5894 - accuracy: 0.7065 - val_loss: 0.5840 - val_accuracy: 0.7083 Epoch 14/120 72/72 [==============================] - 1s 12ms/step - loss: 0.5800 - accuracy: 0.7129 - val_loss: 0.5751 - val_accuracy: 0.7150 Epoch 15/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5720 - accuracy: 0.7177 - val_loss: 0.5677 - val_accuracy: 0.7186 Epoch 16/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5653 - accuracy: 0.7202 - val_loss: 0.5613 - val_accuracy: 0.7230 Epoch 17/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5595 - accuracy: 0.7232 - val_loss: 0.5559 - val_accuracy: 0.7263 Epoch 18/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5546 - accuracy: 0.7257 - val_loss: 0.5514 - val_accuracy: 0.7280 Epoch 19/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5504 - accuracy: 0.7276 - val_loss: 0.5474 - val_accuracy: 0.7294 Epoch 20/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5468 - accuracy: 0.7295 - val_loss: 0.5441 - val_accuracy: 0.7316 Epoch 21/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5436 - accuracy: 0.7309 - val_loss: 0.5410 - val_accuracy: 0.7323 Epoch 22/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5406 - accuracy: 0.7316 - val_loss: 0.5383 - val_accuracy: 0.7335 Epoch 23/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5380 - accuracy: 0.7329 - val_loss: 0.5359 - val_accuracy: 0.7348 Epoch 24/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5356 - accuracy: 0.7341 - val_loss: 0.5337 - val_accuracy: 0.7361 Epoch 25/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5334 - accuracy: 0.7350 - val_loss: 0.5316 - val_accuracy: 0.7368 Epoch 26/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5314 - accuracy: 0.7357 - val_loss: 0.5298 - val_accuracy: 0.7369 Epoch 27/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5294 - accuracy: 0.7362 - val_loss: 0.5279 - val_accuracy: 0.7377 Epoch 28/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5275 - accuracy: 0.7368 - val_loss: 0.5263 - val_accuracy: 0.7383 Epoch 29/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5258 - accuracy: 0.7374 - val_loss: 0.5246 - val_accuracy: 0.7390 Epoch 30/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5240 - accuracy: 0.7380 - val_loss: 0.5231 - val_accuracy: 0.7397 Epoch 31/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5223 - accuracy: 0.7389 - val_loss: 0.5215 - val_accuracy: 0.7406 Epoch 32/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5207 - accuracy: 0.7394 - val_loss: 0.5201 - val_accuracy: 0.7400 Epoch 33/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5191 - accuracy: 0.7398 - val_loss: 0.5185 - val_accuracy: 0.7401 Epoch 34/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5175 - accuracy: 0.7403 - val_loss: 0.5171 - val_accuracy: 0.7406 Epoch 35/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5160 - accuracy: 0.7408 - val_loss: 0.5157 - val_accuracy: 0.7408 Epoch 36/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5145 - accuracy: 0.7411 - val_loss: 0.5144 - val_accuracy: 0.7420 Epoch 37/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5130 - accuracy: 0.7412 - val_loss: 0.5130 - val_accuracy: 0.7422 Epoch 38/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5116 - accuracy: 0.7418 - val_loss: 0.5117 - val_accuracy: 0.7418 Epoch 39/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5103 - accuracy: 0.7425 - val_loss: 0.5105 - val_accuracy: 0.7422 Epoch 40/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5090 - accuracy: 0.7430 - val_loss: 0.5093 - val_accuracy: 0.7423 Epoch 41/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5077 - accuracy: 0.7431 - val_loss: 0.5081 - val_accuracy: 0.7418 Epoch 42/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5064 - accuracy: 0.7437 - val_loss: 0.5070 - val_accuracy: 0.7425 Epoch 43/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5052 - accuracy: 0.7445 - val_loss: 0.5060 - val_accuracy: 0.7422 Epoch 44/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5041 - accuracy: 0.7454 - val_loss: 0.5049 - val_accuracy: 0.7424 Epoch 45/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5029 - accuracy: 0.7455 - val_loss: 0.5040 - val_accuracy: 0.7428 Epoch 46/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5018 - accuracy: 0.7463 - val_loss: 0.5029 - val_accuracy: 0.7434 Epoch 47/120 72/72 [==============================] - 1s 13ms/step - loss: 0.5007 - accuracy: 0.7466 - val_loss: 0.5019 - val_accuracy: 0.7433 Epoch 48/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4997 - accuracy: 0.7473 - val_loss: 0.5010 - val_accuracy: 0.7440 Epoch 49/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4987 - accuracy: 0.7479 - val_loss: 0.5002 - val_accuracy: 0.7437 Epoch 50/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4976 - accuracy: 0.7478 - val_loss: 0.4992 - val_accuracy: 0.7440 Epoch 51/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4967 - accuracy: 0.7483 - val_loss: 0.4984 - val_accuracy: 0.7444 Epoch 52/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4957 - accuracy: 0.7486 - val_loss: 0.4977 - val_accuracy: 0.7444 Epoch 53/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4948 - accuracy: 0.7490 - val_loss: 0.4968 - val_accuracy: 0.7446 Epoch 54/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4938 - accuracy: 0.7497 - val_loss: 0.4960 - val_accuracy: 0.7448 Epoch 55/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4929 - accuracy: 0.7496 - val_loss: 0.4952 - val_accuracy: 0.7455 Epoch 56/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4920 - accuracy: 0.7505 - val_loss: 0.4945 - val_accuracy: 0.7457 Epoch 57/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4911 - accuracy: 0.7505 - val_loss: 0.4938 - val_accuracy: 0.7464 Epoch 58/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4902 - accuracy: 0.7511 - val_loss: 0.4930 - val_accuracy: 0.7468 Epoch 59/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4894 - accuracy: 0.7513 - val_loss: 0.4923 - val_accuracy: 0.7471 Epoch 60/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4886 - accuracy: 0.7517 - val_loss: 0.4916 - val_accuracy: 0.7472 Epoch 61/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4877 - accuracy: 0.7521 - val_loss: 0.4909 - val_accuracy: 0.7478 Epoch 62/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4869 - accuracy: 0.7523 - val_loss: 0.4903 - val_accuracy: 0.7482 Epoch 63/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4861 - accuracy: 0.7527 - val_loss: 0.4896 - val_accuracy: 0.7483 Epoch 64/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4853 - accuracy: 0.7533 - val_loss: 0.4890 - val_accuracy: 0.7488 Epoch 65/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4845 - accuracy: 0.7534 - val_loss: 0.4883 - val_accuracy: 0.7487 Epoch 66/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4837 - accuracy: 0.7534 - val_loss: 0.4878 - val_accuracy: 0.7487 Epoch 67/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4829 - accuracy: 0.7541 - val_loss: 0.4871 - val_accuracy: 0.7487 Epoch 68/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4822 - accuracy: 0.7546 - val_loss: 0.4865 - val_accuracy: 0.7486 Epoch 69/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4814 - accuracy: 0.7548 - val_loss: 0.4860 - val_accuracy: 0.7489 Epoch 70/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4807 - accuracy: 0.7550 - val_loss: 0.4854 - val_accuracy: 0.7492 Epoch 71/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4800 - accuracy: 0.7554 - val_loss: 0.4848 - val_accuracy: 0.7489 Epoch 72/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4792 - accuracy: 0.7555 - val_loss: 0.4843 - val_accuracy: 0.7496 Epoch 73/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4785 - accuracy: 0.7561 - val_loss: 0.4838 - val_accuracy: 0.7503 Epoch 74/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4778 - accuracy: 0.7560 - val_loss: 0.4832 - val_accuracy: 0.7498 Epoch 75/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4770 - accuracy: 0.7563 - val_loss: 0.4827 - val_accuracy: 0.7501 Epoch 76/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4764 - accuracy: 0.7561 - val_loss: 0.4822 - val_accuracy: 0.7498 Epoch 77/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4757 - accuracy: 0.7562 - val_loss: 0.4817 - val_accuracy: 0.7502 Epoch 78/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4750 - accuracy: 0.7570 - val_loss: 0.4814 - val_accuracy: 0.7508 Epoch 79/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4743 - accuracy: 0.7570 - val_loss: 0.4809 - val_accuracy: 0.7518 Epoch 80/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4736 - accuracy: 0.7573 - val_loss: 0.4803 - val_accuracy: 0.7510 Epoch 81/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4730 - accuracy: 0.7577 - val_loss: 0.4798 - val_accuracy: 0.7514 Epoch 82/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4723 - accuracy: 0.7580 - val_loss: 0.4793 - val_accuracy: 0.7517 Epoch 83/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4717 - accuracy: 0.7588 - val_loss: 0.4789 - val_accuracy: 0.7520 Epoch 84/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4710 - accuracy: 0.7588 - val_loss: 0.4785 - val_accuracy: 0.7524 Epoch 85/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4704 - accuracy: 0.7596 - val_loss: 0.4781 - val_accuracy: 0.7530 Epoch 86/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4698 - accuracy: 0.7597 - val_loss: 0.4777 - val_accuracy: 0.7542 Epoch 87/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4692 - accuracy: 0.7603 - val_loss: 0.4772 - val_accuracy: 0.7532 Epoch 88/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4685 - accuracy: 0.7605 - val_loss: 0.4769 - val_accuracy: 0.7530 Epoch 89/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4680 - accuracy: 0.7610 - val_loss: 0.4767 - val_accuracy: 0.7540 Epoch 90/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4674 - accuracy: 0.7614 - val_loss: 0.4760 - val_accuracy: 0.7536 Epoch 91/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4667 - accuracy: 0.7621 - val_loss: 0.4757 - val_accuracy: 0.7542 Epoch 92/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4662 - accuracy: 0.7619 - val_loss: 0.4753 - val_accuracy: 0.7540 Epoch 93/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4655 - accuracy: 0.7627 - val_loss: 0.4749 - val_accuracy: 0.7550 Epoch 94/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4650 - accuracy: 0.7628 - val_loss: 0.4746 - val_accuracy: 0.7543 Epoch 95/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4644 - accuracy: 0.7630 - val_loss: 0.4741 - val_accuracy: 0.7543 Epoch 96/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4638 - accuracy: 0.7636 - val_loss: 0.4738 - val_accuracy: 0.7539 Epoch 97/120 72/72 [==============================] - 2s 26ms/step - loss: 0.4632 - accuracy: 0.7636 - val_loss: 0.4734 - val_accuracy: 0.7550 Epoch 98/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4626 - accuracy: 0.7644 - val_loss: 0.4731 - val_accuracy: 0.7556 Epoch 99/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4620 - accuracy: 0.7645 - val_loss: 0.4727 - val_accuracy: 0.7558 Epoch 100/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4615 - accuracy: 0.7653 - val_loss: 0.4727 - val_accuracy: 0.7549 Epoch 101/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4609 - accuracy: 0.7658 - val_loss: 0.4721 - val_accuracy: 0.7561 Epoch 102/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4603 - accuracy: 0.7661 - val_loss: 0.4717 - val_accuracy: 0.7561 Epoch 103/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4597 - accuracy: 0.7660 - val_loss: 0.4715 - val_accuracy: 0.7560 Epoch 104/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4592 - accuracy: 0.7667 - val_loss: 0.4710 - val_accuracy: 0.7561 Epoch 105/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4586 - accuracy: 0.7672 - val_loss: 0.4707 - val_accuracy: 0.7564 Epoch 106/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4581 - accuracy: 0.7678 - val_loss: 0.4704 - val_accuracy: 0.7563 Epoch 107/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4574 - accuracy: 0.7679 - val_loss: 0.4701 - val_accuracy: 0.7560 Epoch 108/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4569 - accuracy: 0.7678 - val_loss: 0.4698 - val_accuracy: 0.7558 Epoch 109/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4564 - accuracy: 0.7690 - val_loss: 0.4695 - val_accuracy: 0.7558 Epoch 110/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4559 - accuracy: 0.7689 - val_loss: 0.4692 - val_accuracy: 0.7563 Epoch 111/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4553 - accuracy: 0.7690 - val_loss: 0.4689 - val_accuracy: 0.7557 Epoch 112/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4547 - accuracy: 0.7698 - val_loss: 0.4687 - val_accuracy: 0.7565 Epoch 113/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4542 - accuracy: 0.7707 - val_loss: 0.4685 - val_accuracy: 0.7562 Epoch 114/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4536 - accuracy: 0.7707 - val_loss: 0.4680 - val_accuracy: 0.7559 Epoch 115/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4531 - accuracy: 0.7715 - val_loss: 0.4676 - val_accuracy: 0.7573 Epoch 116/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4525 - accuracy: 0.7711 - val_loss: 0.4676 - val_accuracy: 0.7573 Epoch 117/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4520 - accuracy: 0.7716 - val_loss: 0.4671 - val_accuracy: 0.7566 Epoch 118/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4514 - accuracy: 0.7714 - val_loss: 0.4668 - val_accuracy: 0.7569 Epoch 119/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4509 - accuracy: 0.7723 - val_loss: 0.4665 - val_accuracy: 0.7573 Epoch 120/120 72/72 [==============================] - 1s 13ms/step - loss: 0.4504 - accuracy: 0.7726 - val_loss: 0.4662 - val_accuracy: 0.7577
Valutiamo perciò i risultati.
plt.figure(num=None, figsize=(8, 6), dpi=80)
plt.title('Funzione di Costo')
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')
plt.xlabel('Epoca')
plt.legend()
plt.show()
plt.figure(num=None, figsize=(8, 6), dpi=80)
plt.title('Accuracy')
plt.plot(history.history['accuracy'], label='train')
plt.plot(history.history['val_accuracy'], label='validation')
plt.xlabel('Epoca')
plt.legend()
plt.show()
E' evidente un leggero miglioramento, si passa da circa 0.74 a quasi 0.76, inoltre, la rete riesce a completare la fase di training molto più velocemente, sono bastate infatti soltanto 120 epoche. Tuttavia, per periodi più lunghi si potrebbe incorrere in overfitting.
Valutiamo infine il risultato della rete sul test set.
_,accuracy=model.evaluate(scaled_X_TEST,Y_TEST)
765/765 [==============================] - 1s 1ms/step - loss: 0.4688 - accuracy: 0.7548
Anche in questo caso le prestazioni sono migliorate. L'assenza di overfitting è anche qui evidente, le prestazioni sul validation set si mantengono, il modello è in grado di generalizzare bene sui dati.
Il capitolo sulle reti neurali conclude il presente progetto. Lo scopo del presente lavoro consisteva principalmente nell'utilizzo delle librerie Python per l'implementazione delle tecniche di Data Mining. Al tal fine è stato scelto un dataset su Kaggle, relativo a degli snapshot dei round di alcune partite di CS:GO. Il task di riferimento era di classificazione binaria: sulla base degli altri attributi delle tuple bisognava determinare il valore del round_winner, i.e. quale tra le due squadre avesse vinto.
Image(url="https://www.researchgate.net/profile/Karel-Janecka/publication/295858646/figure/fig3/AS:667788658089996@1536224631611/The-current-process-model-for-data-mining-provides-an-overview-of-the-life-cycle-of-a.ppm")
Il primo passo è stato la visualizzazione dei dati, nella quale si sono visualizzate le distribuzioni, si è indagato su alcune caratteristiche del dataset, si è provveduto ad arricchirlo (e.g. inserendo il costo totale dell'equipaggiamento delle squadre) ecc... In sintesi, si è fatto tutto il necessario per comprendere appieno la forma e la natura dei dati, raccogliendo informazioni utili per le fasi successive del progetto.
Successivamente, si è provveduto alla fase di preprocessing, ove i dati sono stati ripuliti di eventuali null values, outliers e rumore, sono stati codificati gli attributi categorici, eliminate le features eccessivamente correlate e quelle non importanti. In questa fase, si è cercato di migliorare quanto più possibile la qualità dei dati a nostra disposizione, al fine di migliorare le prestazioni dei classificatori.
I dati pre-processati sono stati perciò utilizzati con diversi classificatori, in particolare:
Per ognuno di questi si è partiti da una versione base, senza particolare attenzione verso gli iperparametri; successivamente, nella fase di tuning, si sono analizzati tali parametri in dettaglio, sia per quanto riguarda il lato tecnico-descrittivo, sia per quanto concerne quello pratico: sono state graficate le prestazioni dei classificatori in base alla variazione dei parametri al fine di identificare, nei grafici, dei ginocchi, da usare poi nel grid-search per definire un range di valori dove focalizzare la ricerca (l'utilizzo del grid-search su uno spazio di ricerca troppo ampio avrebbe richiesto troppo tempo per le nostre risorse computazionali). In alcuni casi sono state osservati interessanti miglioramenti (e.g. K-neighbors), in altri casi i risultati non sono stati così netti, in alcuni casi si è notato persino un peggioramento (e.g. SGD).
Dopo aver costruito e ottimizzato i classificatori, si è provveduto all'analisi delle performance (valutate sul test set). Le metriche usate sono state accuracy, recall, precision, f1-score; sono state graficate anche le ROC curve per una più completa dimostrazione del comportamento dei vari classificatori.
$$ Precision = \frac{TP}{TP+FP}$$
$$ Recall = \frac{TP}{TP+FN}$$
$$ F1-measure= \frac{2}{\frac{1}{P}+\frac{1}{R}}$$
Nell'ultimo capitolo è stato usato invece il framework Keras per la realizzazione di una rete neurale (utilizzata sempre nel contesto del task di classificazione binaria presentato prima). Come prima cosa, si è creata una prima versione della rete, cercando di settare la topologia della rete nel modo migliore, senza tuttavia un criterio particolare a cui affidarsi. Successivamente, tramite Keras-tuner, si è provveduto ad eseguire il tuning della rete, ricercando tra vari livelli, numero di neuroni per livello, funzioni di attivazione ec... Si è quindi provveduto a testare la versione ottimizzata, che ha presentato dei leggeri miglioramenti.
Le sfide affrontate nella realizzazione del presente progetto sono state molte, tutte, però, estremamente utili per una comprensione più completa della disciplina: dalla scelta del dataset (compresi primi tentativi conclusisi in un fallimento a causa dei dati scadenti), l'incontro con framework da dover padroneggiare, la scelta dei classificatori e la loro ottimizzazione, lo studio per riuscire a ottenre le performance migliori ecc...